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ABSTRACT 


The  CORDIC  algorithm  is  an  accurate  way  to  compute  the  value  of  a  function 
like  sin(x),  for  a  given  value  of  x.  However,  it  is  iterative  and  slow.  In  this  thesis,  we 
show  that  a  wide  class  of  arithmetic  functions  can  be  realized  on  the  SRC-6,  a 
recon llgurable  computer,  using  polynomial  approximations.  The  function  is  realized  by 
partitioning  its  domain  into  segments  and  then  approximating  the  function  in  each 
segment  by  a  quadratic  polynomial.  This  is  not  an  iterative  approach,  and  so  it  is  faster 
than  the  CORDIC  algorithm 

Two  approximation  methods  are  implemented.  In  one  method,  non-uniform 
segments  are  used.  Here,  larger  segments  can  be  used  where  the  function  is  close  to 
quadratic,  while  highly  non-quadratic  regions  require  smaller  segments.  This  approach 
minimizes  the  number  of  segments.  In  the  other  method,  unifonn  segments  are  used. 
Although  more  segments  are  needed  than  in  the  non-unifonn  method,  the  circuit  is 
simpler. 

We  show  that  accuracies  of  up  to  33  bits  are  possible.  A  pipelined  circuit  was 
built  on  the  SRC-6  in  two’s  complement  and  floating  point.  We  also  show  an  efficient 
algorithm  for  segmenting  the  function,  which  is  faster  than  previous  methods. 
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EXECUTIVE  SUMMARY 


This  thesis  focuses  on  the  high-speed  implementation  of  arithmetic  functions, 
such  as  sin(/z\x),  ln(x)  and  2 ‘ .  Meteorological  computations,  scientific  calculations  and 
graphics  are  applications  that  require  fast  mathematical  computation. 

The  CORDIC  algorithm  and  Taylor  series  expansion  are  methods  used  to 
compute  trigonometric  functions.  The  CORDIC  algorithm  is  hardware  efficient,  precise, 
but  iterative  in  design  and  therefore  slow. 

In  this  thesis,  we  investigate  a  way  to  speed  up  mathematical  computations  by 
using  piecewise  quadratic  approximations  built  on  reconfigurable  hardware.  The 
function  is  realized  by  partitioning  its  domain  into  segments  and  then  approximating  the 
function  in  each  segment  by  a  quadratic  polynomial.  This  is  not  an  iterative  approach, 
and  so  it  is  faster  than  the  CORDIC  algorithm 

The  reconfigurable  hardware  used  is  the  SRC-6E  that  is  designed  by  SRC 
Computers  in  Colorado  Springs,  Colorado. 

The  objectives  were  to: 

•  Find  an  efficient  algorithm  to  segment  any  numeric  function  using 
piecewise  quadratic  approximations. 

•  Find  an  accurate  segmentation  (accurate  when  evaluated  using  the 
approximation  polynomial)  to  any  numeric  function  given  an  accuracy 
constraint  in  terms  of  number  of  bits. 

•  Design  pipelined  hardware  for  the  Numeric  Function  Generator  (NFG) 
with  a  small  pipeline  depth  (compared  to  what  is  currently  available). 

•  Design  NFG  to  operate  at  100MHz  or  faster  on  the  FPGA. 

Segmentation  is  a  preliminary  step  to  provide  a  memory  file  that  contains  the 
number  of  segments  for  the  numeric  function,  and  each  segment’s  coefficients  needed  to 
compute  the  approximation  polynomial. 
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MATLAB  is  used  to  segment  any  function  over  a  defined  interval.  The 
MATLAB  program  needs  to  know  the  function,  interval,  desired  accuracy  and  the 
number  of  discrete  points  in  the  interval.  The  MATLAB  built-in  function,  Polyfit,  was 
used  to  compute  the  coefficients  of  the  approximation  polynomial,  but  analysis  showed 
that  the  approximation  computed  using  this  method  did  not  efficiently  segment  the 
function.  Polyfit  is  computationally  fast,  but  results  in  an  inefficiently  segmented 
function. 

The  Remez  algorithm  is  used  to  efficiently  segment  the  numeric  function.  The 
Remez  algorithm  evenly  distributes  the  approximation  error  on  each  segment,  but  is 
computationally  intensive  and  slow.  Several  methods  were  investigated  to  speed  up  the 
algorithm.  The  best  method  to  speed  up  the  program,  involved  a  hybrid  of  three  methods. 

•  Segment  width  estimation  that  requires  the  third  derivative  of  the  numeric 
function  and  the  accuracy  desired  by  the  user. 

•  Search  algorithm  similar  to  a  binary  search 

•  Single  stepping  through  points  and  testing  to  detennine  if  the  accuracy  has 
been  met. 

The  program  computes  an  estimated  segment  width  and  a  metric  is  used  to 
determine  the  quality  of  the  estimation.  If  the  metric  indicates  the  estimation  quality  is 
poor,  then  the  program  will  use  the  search  algorithm  to  get  closer  to  the  optimum  width. 
In  the  final  step,  the  program  single  steps  through  the  points  and  tests  each  approximation 
to  determine  when  the  accuracy  has  been  met.  When  the  segmentation  of  the  function  is 
complete,  the  optimum  segment  width  and  the  associated  coefficients  are  saved  in  a 
memory  file  for  use  in  the  NFG. 

The  segmentation  algorithm  sped  up  the  program  tremendously.  If  the  domain  is 
divided  into  over  a  million  points,  the  original  program  would  take  at  least  one  million 
tests  to  segment  a  function.  In  each  test,  the  program  computes  the  coefficients  and  tests 
the  polynomial  against  the  numeric  function  to  see  if  the  accuracy  is  met.  When  the 
speed  up  algorithm  is  used,  the  program  requires  much  less  than  0.1%  of  the  number  of 
tests  than  without  the  speed  up.  Table  1  shows  the  results  when  15  functions  were  tested. 
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The  interval  is  shown  in  the  second  column,  the  speed  up  is  shown  in  percentage  format 

in  the  third  column  and  the  last  column  shows  the  number  of  segments.  The  percentage 

,  #  of  tests  x  100 

is  computed  as:  — - - . 

1,000,000 


Epsilon  =  0.0000000596  =  2A-24.0 

N  =  1000000 

Function 

Interval 

%Of  tests  #  of 

Segments 

2  Ax 

[0,1] 

0.00910 

35 

l-/x 

[1,2] 

0.01020 

50 

sqrt (x) 

[1,2] 

0.00750 

24 

1/sqrt (x) 

[1,2] 

0.00720 

36 

log2 (x) 

[1,2] 

0.00900 

44 

log (x) 

[1,2] 

0.00780 

39 

sin (pi*x) 

[0,1/2] 

0.01990 

58 

cos (pi*x) 

[0,1/2] 

0.01740 

58 

tan  (pi*x) 

[0,1/4] 

0.01240 

58 

sqrt ( -log (x . . . 

[1/512,1/4] 

0.04070 

163 

tan (pi*x)  . /' .  .  . 

[0,1/4] 

0.02180 

79 

-  (x*log2 (x)  . .  . 

[1/256, 1-1/256] 

0.04710 

183 

1/  ( 1+exp  ( -x . . . 

[0,1] 

0.00920 

20 

(1/sqrt (2*p . . . 

[0, sqrt  (2) ] 

0.01670 

45 

sin (exp (x) ) 

[0,2] 

0.07810 

265 

•k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

Table  1.  Speed-up  in  computation  time  for  15  functions  (expressed  as  a  percentage 
of  the  time  needed  when  the  domain  is  divided  into  1,000,000  points) 

->-24 

for  ^  -  2 


The  NFG  circuit  consists  of  three  multipliers,  one  3-input  adder,  a  segment 
indexing  method  and  the  memory  that  contains  the  approximation  polynomials’ 
coefficients  for  each  segment. 

Figure  1  is  a  block  diagram  that  shows  an  overview  of  the  NFG  circuit. 
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Figure  1.  Numeric  function  generator  (NFG)  architecture. 


Two  approximation  methods  are  implemented.  In  one  method,  non-uniform 
segments  are  used.  Flere,  larger  segments  can  be  used  where  the  function  is  close  to 
quadratic,  while  highly  non-quadratic  regions  require  smaller  segments.  This  approach 
minimizes  the  number  of  segments.  In  the  other  method,  unifonn  segments  are  used. 
Although  more  segments  are  needed  than  in  the  non-uniform  method,  the  circuit  is 
simpler. 

We  show  that  accuracies  of  up  to  33  bits  are  possible.  A  pipelined  circuit  was 
built  on  the  SRC-6  in  two’s  complement  and  floating  point.  The  floating  point 

implementation  is  easier  to  program  via  the  interface  that  SRC  provides.  A 
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<subroutine>.mc  file  is  a  C-like  file  that  is  compiled  into  the  hardware  that  resides  on  the 
FPGAs  in  the  SRC  Multi-Adaptive  Programming  (MAP)  board. 

Using  fixed  point  implementation  produces  a  shorter  pipeline  depth 
(approximately  30%  of  the  floating  point  pipeline  depth),  but  requires  more  effort  by  the 
programmer  to  ensure  the  bits  are  aligned  correctly.  In  fixed  point  implementation,  the 
bits  are  truncated  instead  of  rounded.  This  introduces  errors  in  the  intermediate 
computations  that  propagate  to  the  final  answer. 

The  best  solution  to  this  problem  is  to  build  a  user  macro  multiplier  that  takes  care 
of  the  rounding  and  ensures  the  bits  are  aligned  in  the  intermediate  results  of  the 
polynomial  computation. 
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I.  INTRODUCTION 


A.  PROBLEM  STATEMENT  AND  PURPOSE 

High-speed  numeric  computation  has  many  applications  including  digital  signal 
processing,  graphics  rendering,  meteorological  modeling,  etc.  These  applications  require 
numeric  calculations  to  be  computed  quickly.  In  addition,  the  hardware  may  be  required 
to  compute  large  amounts  of  data  or  streaming  data,  which  means  long  periods  of  time, 
may  be  expended  performing  the  one  type  of  computation.  Personal  computers  are 
general  purpose  and  not  specifically  designed  for  numeric  calculations  alone;  instead  they 
provide  the  best  compromise  between  speed  and  flexibility. 

The  CORDIC  algorithm  can  be  very  precise,  but  it  has  the  disadvantage  of  being 
iterative  and  slow;  the  operations  can  take  hundreds  to  thousands  of  clock  cycles.  Each 
iteration  in  the  CORDIC  algorithm  provides  increased  accuracy  at  the  output  [4], 

It  would  be  beneficial  to  have  specialized  and  fast  hardware  for  high  speed 
numeric  calculations.  Conventional  methods  for  computing  numeric  functions  include 
the  CORDIC  algorithm  [2],  [3],  [4].  The  problem  is  that  specialized  hardware  is 
inflexible  to  computing  different  numeric  functions  as  well  as  to  changes  in  requirements 
or  software  updates.  However,  specialized  hardware  is  fast. 

A  very  fast  method  for  numeric  calculations  is  a  look-up  table  [5],  i.e.  for  every 
possible  input,  store  the  desired  output  of  the  numeric  function.  The  disadvantage  of  this 
approach  is  that  a  large  amount  of  memory  is  needed. 

Field  programmable  devices  have  the  advantage  that  one  can  quickly  design,  test 
and  replace  hardware  functionality.  This  is  compared  to  traditional  methods,  whereby  a 
prototype  is  designed  and  simulated  in  software,  prototyped  on  a  prototyping  board,  and 
then  sent  to  a  manufacturer.  This  is  expensive  and  time  consuming,  especially  if  there 
are  changes  required. 

FPGA  technology  has  improved  to  the  point  that  a  large  amount  of  logic  is 
available.  If  we  have  a  few  divergent  needs  that  may  require  particularly  heavy- 
computation  that  can  best  be  solved  by  specialized  hardware,  we  can  use  the  FPGA 
devices  to  implement  a  specialized  hardware  design.  Once  the  task  has  been  completed, 
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the  hardware  can  be  reconfigured  for  other  uses.  The  NFG  we  will  discuss  uses  this 
principle  on  the  SRC-6  computer  system. 

Lee,  Wayne,  Villasenor  and  Cheung  [6],  used  a  cascade  of  AND  and  OR  gates  to 
calculate  segment  addresses  in  a  non-uniform  segmentation  implementation  for  hardware 
function  evaluation.  This  circuit  is  useful  for  a  limited  class  of  functions.  Sasao,  Butler 
and  Riedel  [5]  present  a  universal  circuit  that  can  cater  to  a  wider  class  of  functions. 

Sasao,  Butler  and  Riedel  [5]  have  shown  that  elementary  and  non-elementary 
numeric  functions  can  be  computed  quickly  and  accurately  using  a  piecewise  linear 
approximation  method.  This  method  provides  some  advantages  over  the  memory  method 
and  the  CORDIC  algorithm.  Less  memory  is  required  than  a  look-up  table  because  the 
numeric  function  is  segmented  and  the  coefficients  of  the  piecewise  linear  approximation 
are  stored  vice  storing  every  possible  input  value  and  its  corresponding  output.  The  other 
advantage  is  that  the  accuracy  can  be  determined  at  the  outset  and  therefore  is  faster  than 
the  CORDIC  algorithm;  especially  at  higher  accuracy  when  the  CORDIC  must  go 
through  several  iterations  to  attain  the  desired  accuracy.  One  more  advantage  to  this 
approach  is  that  it  allows  for  one  hardware  design,  with  the  memory  contents  being 
changed  to  handle  different  numeric  functions  [1]. 

This  thesis  investigates  a  piecewise  quadratic  implementation.  The  quadratic 
implementation  requires  fewer  segments  than  the  linear  implementation  to  compute  the 
same  numeric  functions  to  the  same  accuracy.  This  also  means  that  the  memory  required 
is  less  than  that  required  to  implement  a  piecewise  linear  approximation  NFG. 
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B. 


IMPLEMENTATION  OVERVIEW 


Figure  1  shows  of  the  hardware  required  to  build  the  NFG  using  quadratic 
approximation.  The  NFG  architecture  requires  three  multipliers.  Each  requires 
significant  logic  and  causes  significant  delay. 


Input  X 


Figure  1.  Numeric  function  generator  (NFG)  architecture. 
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Table  1  shows  the  suite  of  functions  used  to  test  and  design  the  NFG.  Unlike 
logic  or  software  design,  there  is  no  set  of  benchmarks.  The  specific  functions  have  been 
chosen  because  they  have  appeared  in  previous  papers  on  this  subject  [1],  [5],  [8], 
[9],[1 1],  [12],  [15]. 


# 

Function 

f(x) 

Interval 

X 

f(x) 

1 

2X 

[0,1] 

[1,2] 

2 

1/x 

[1,2] 

[1/2,1] 

3 

y[x 

[1,2] 

[o.VJ] 

4 

1/ Vx 

[1,4 

[1/72,1] 

5 

log2(x) 

[1,2] 

[0,1] 

6 

ln(x) 

[1,2] 

[0,ln2] 

7 

sin(/rx) 

[0,1/2] 

[0,1] 

8 

cos(;rx) 

[0,1/2] 

[0,1] 

9 

tan(;rx) 

[0,1/4] 

[0,1] 

10 

-\j—  |n(x) 

[1/512,1/4] 

/-ta(l/4),V-li>(l/512)] 

11 

tan2(;rx)  +  l 

[0,1/4] 

[1,2] 

12 

-(x  log2x  +  (1-x)  log2(l-x)) 

[1/256,1-1/256] 

[0,1] 

13 

1 

1  +e~x 

[0,1] 

i'/2’l«-j 

14 

1  e* 

Vl7r 

[o,  V2] 

1  1 

V2/r  yfbre1 

15 

sin(U) 

[0,2] 

[1,-1] 

Table  1 .  Suite  of  numeric  functions  and  their  domains. 


4 


C.  THESIS  ORGANIZATION 

This  thesis  is  organized  into  six  chapters.  Chapter  I  is  the  introduction.  Chapter  II 
covers  the  segmentation  of  numeric  functions  and  the  methods  used  for  computing  the 
approximation  of  the  functions;  this  includes  the  discussion  on  how  the  coefficients  were 
computed  and  how  the  memory  files  were  used  in  the  NFG.  These  programs  were 
designed  in  MATLAB  [7],  In  Chapter  III,  the  circuit  description  design  is  covered. 
Chapter  IV  introduces  the  SRC  computer  architecture.  The  experimental  results  are 
discussed  in  Chapter  V.  The  summary  and  suggested  future  work  is  discussed  in  Chapter 
VI. 
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II.  FUNCTION  APPROXIMATION 


The  NFG  approximates  the  realized  function  by  polynomial.  In  a  typical 
realization,  many  polynomials  are  used.  A  segment  is  a  sub-domain  in  the  interval  of 
approximation  where  one  polynomial  is  used  to  approximate  the  function.  In  this  thesis 
quadratic  polynomials  are  used.  The  benefit  of  using  a  polynomial  approximation  is  that 
only  one  hardware  design  is  required  to  realize  a  multitude  of  functions.  The  only  change 
required  to  the  hardware  is  to  change  the  specific  endpoints  of  the  segmentation  of  the 
functions  to  be  realized  and  the  associated  coefficients.  The  segmentation  endpoint  and 
coefficients  are  generated  in  MATLAB  and  are  stored  in  a  memory  file.  Segmentation  is 
described  in  detail  below. 

The  realized  functions  are  approximated  and  the  output  of  the  hardware  is  only  as 
accurate  as  the  user-defined  precision.  The  approximation  error  is  s .  The  exact  function 
is  evaluated  for  various  values  in  the  domain.  The  polynomial  that  is  used  to 
approximate  the  function  is  evaluated  for  the  same  values  in  the  domain.  The  difference 
between  these  two  results  is  the  approximation  errors  .  The  approximation  errors  is  the 
constraint  used  to  keep  the  approximation  in  check. 

The  approximation  errors,  directly  impacts  how  many  segments  are  required  and 
therefore  dictates  how  much  memory  is  used  to  store  the  coefficients.  Small  values 
require  many  segments. 

A.  QUADRATIC  VS  LINEAR 

Nagayama,  Sasao  and  Butler  [8]  showed  that  using  quadratic  approximations  in 
the  NFG  requires  an  average  of  only  4%  of  the  memory  required  when  using  linear 
approximations.  This  gives  the  motivation  to  pursue  quadratic  approximation  following 
the  work  on  linear  approximation  that  was  performed  by  Mack  [1]. 

In  Table  2,  the  number  of  segments  required  for  different  accuracies  is  tabulated 
for  both  quadratic  approximation  and  linear  approximation.  A  column  is  also  included 
that  shows  the  ratio  of  quadratic  to  linear  segments  required. 
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Function 

^  =  2~17 

s  =  2~24 

^  =  2~33 

Segments 

Quad/Lin 

% 

Segments 

Quad/Lin 

% 

Segments 

Quad/Lin 

% 

2X 

7/75 

9.33 

35/849 

4.12 

278/19008 

1.46 

1/x 

10/75 

13.33 

50/849 

5.89 

400/18996 

2.11 

Vx 

5/35 

14.29 

24/388 

6.19 

189/8729 

2.17 

1/Vx 

8/50 

16.00 

36/565 

6.37 

288/12684 

2.27 

log2(x) 

9/76 

11.84 

44/853 

5.16 

351/19097 

1.84 

ln(x) 

8/63 

12.70 

39/710 

5.49 

311/15927 

1.95 

sin(/rx) 

12/109 

11.01 

58/1227 

4.73 

461/27361 

1.68 

cos(;rx) 

12/109 

11.01 

58/1227 

4.73 

459/27361 

1.68 

tan(;rx) 

12/73 

16.44 

58/822 

7.06 

459/18371 

2.50 

^-ln(x) 

33/207 

15.94 

163/2356 

6.92 

1312/47188 

2.78 

tan2(;rx)  +  l 

16/152 

10.53 

79/1721 

4.59 

631/38087 

1.65 

-(x  log2x  +  (1-x)  log2(l- 

x)) 

37/314 

11.78 

183/3556 

5.15 

1459/76334 

1.91 

1 

1  +  ex 

4/20 

20.00 

20/226 

8.85 

158/5087 

3.11 

1  I' 

i — e 

^2rc 

9/53 

16.98 

45/595 

7.56 

357/13312 

2.68 

sin(ex) 

54/449 

11.80 

265/5099 

5.20 

2121/101065 

2.10 

Table  2.  Segmentation  required  for  linear  and  quadratic  approximations. 


To  calculate  the  memory  required  for  a  single  segment,  one  needs  to  take  into 
account  that  memory  for  linear  approximations  only  requires  two  quantities  (slope  and 
intercept)  and  memory  for  quadratic  approximation  requires  three  quantities.  That  is  a 
50%  increase  in  memory  requirements  for  a  single  segment  when  compared  to  linear. 
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However,  the  sheer  difference  in  number  of  segments  required  for  quadratic  vice  linear, 
more  than  counterbalances  for  the  increase  in  memory  requirements 

Table  2  shows  that  quadratic  approximations  can  cover  more  functions  with  fewer 
segments  than  linear  approximations  and  on  average,  quadratic  approximations  take  up 
only  4%  of  the  memory  required  to  represent  the  same  function  when  using  linear 
approximations  [8]. 


B.  SEGMENTATION 

To  evaluate  a  numeric  function  using  polynomial  approximation,  we  need  to 
segment  the  domain  of  the  numeric  function  such  that  each  segment  has  one  set  of 
coefficients  that  evaluate  to  the  polynomial  approximation  of  the  given  numeric  function. 
The  polynomial  approximation  needs  to  satisfy  the  user  defined  s  such  that  any  value  in 
the  domain  that  is  evaluated  using  the  polynomial  will  produce  an  output  f(x)  that  has  an 
error  no  greater  than  s  in  magnitude.  The  segmentation  is  performed  in  MATLAB 
routines. 

Segmentation  can  be  perfonned  using  either  uniform  or  non-uniform  segments. 
The  coefficients  of  the  approximation  polynomial  can  be  computed  using  Polyfit  [7], 
which  is  a  built-in  MATLAB  function  or  the  Chebyshev  and  the  Remez  [13]  algorithm 
which  is  a  user  function.  We  will  discuss  these  approaches  in  more  detail. 

1,  Uniform  and  Non-Uniform  Segmentation 

There  are  two  general  methods  used  in  approximating  a  function;  uniform  and 
non-uniform  segmentation.  Different  functions  behave  differently  when  segmented  using 
uniform  or  non-unifonn  segmentation.  Non-uniform  segmentation  allows  the  user  to  take 
advantage  of  functions  that  have  both  rapidly  changing  and  non-rapidly  changing 
sections.  When  functions  have  sections  of  high  curvature,  non-uniform  segmentation  can 
create  smaller  segments  to  ensure  the  polynomial  approximation  does  not  exceed  s .  The 
more  quadratic  or  linear  the  function  is,  the  better  the  polynomial  approximation  can  fit  a 
quadratic  polynomial  to  it.  As  a  result,  segments  are  longer  in  regions  where  the  function 
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is  linear  or  quadratic.  The  goal  is  to  achieve  the  fewest  segments  possible  and  yet 
achieve  the  approximation  error  specified  by  the  user.  Figure  2  shows  the  non-uniform 

segmentation  of  -J-\n(x)  using  s  =  2  16  (accurate  to  16  binary  bits).  This  function 

illustrates  the  advantage  of  non-uniform  segmentation.  The  smaller  segments  are  located 
at  the  beginning  of  the  domain  and  the  larger  segments  are  at  the  end. 


NON-UNIF  ORM  fix;=  sqrts-l  ogix))  segmentation.  No.  of  segments  =  26. 


Figure  2.  Quadratic  segmentation  of  y]-  ln(x)  shows  the  difference  in  the  size  of 

segments  due  to  curvature  of  the  function. 


As  mentioned  above,  the  error  associated  with  this  segmentation  should  not 
exceeds  =  2  1,1 .  Figure  3  shows  the  error  across  the  interval  of  approximation  when  non- 
uniform  segmentation  is  used.  For  all  but  the  last  segment,  the  maximum  absolute  error 
is  the  same  (about 2  16011 ).  As  shown  in  Figure  3,  the  error  does  not  exceed  s  anywhere. 
Note  that  the  error  in  the  last  (right  most)  segment  is  much  less  than  in  all  other  segments. 
This  is  because  the  last  segment  is  truncated  by  the  boundary  of  the  domain  interval 
before  the  algorithm  has  a  chance  to  maximize  the  size  of  the  segment. 
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Figure  3. 


Segment  error  of  yj-  ln(x)  when  s  =2  16 


Figure  4,  shows  the  approximation  error  in  the  case  of  this  same  function  when 
uniform  segmentation  is  applied1.  To  achieve  uniform  segmentation  within  the  same 
approximation  error  specification  i.e.  2-16,  we  are  required  to  use  the  width  of  the 
narrowest  segment  which  in  this  case  is  the  very  first  segment. 


1  Because  a  large  number  of  segments  are  required,  the  line  width  occupies  the  whole  of  the  figure, 
making  it  appear  completely  solid. 
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UNIFORM  fix)=3qrt(-logCxV!  segmentation.  No.  ofsegments  =  714. 


x 


Figure  4.  Quadratic  uniform  segmentation  for  yj-  ln(x)  when  limited  when  a  =  2  16 


The  error  function  for  a  uniform  segmentation  looks  different  from  that  of  the 
non-uniform  segmentation.  The  error  for  uniform  segmentation  is  maximum  i.e.  a  is 
attained  in  the  most  limiting  segment.  However,  when  looking  at  the  other  segments  the 
error  does  not  reach  a  .  Therefore  a  tapered  effect  is  observed.  To  best  demonstrate  this 
effect,  we  shall  use  a  less  “dramatic”  function  than  yj-  ln(x)  .  Instead  cosf/rx)  is  used  in 

Figure  5  and  Figure  6  to  show  the  difference  in  the  error  between  the  uniform  and  non- 
uniform  segmentation. 

Below  in  Figure  5,  the  error  is  tapered  showing  that  the  earlier  segments  don’t 
take  full  advantage  of  the  entire  segment  because  they  have  been  limited  by  the  smallest 
segment,  located  at  the  end  of  the  domain  for  the  cos(^rx)  function. 

In  Figure  6  however,  you  can  see  that  non-uniform  segmentation  has  taken  full 
advantage  of  all  the  space  and  has  fewer  segments  to  represent  the  same  function.  This  is 
the  advantage  of  the  non-uniform  segmentation  over  uniform  segmentation. 
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Figure  5. 


Figure  6. 


,  10*  Error  for  UNIFO  Rid  f(x)=cos(pi*x)  segmentation.  No.  ofsegs=  14. 


Uniform  segmentation  error  for  cos(;rx)  when  limited  by  s  =  2  17 


x  10'e  Error  for  NON-UNIFORM  t(x)-cos(pi*x)  segmentation.  No.  ofsegs  =  12. 


Error  for  non-unifonn  segmentation  for  cos(;rx)  when  limited  by  s  =  2  17 
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In  the  segmentation  of  a  numeric  function,  a  user  interface  was  designed  in  a 
MATLAB  program  to  get  the  user’s  choices.  The  user  interface  allows  the  user  to  select 
which  function  he/she  would  like  to  segment  and  allows  the  user  to  select  the  number  of 
points  (to  subdivide  the  domain),  s ,  and  whether  uniform  or  non-uniform  segmentation  is 
used. 

If  the  user  selects  non-uniform  segmentation,  the  interface  looks  like  that  shown 
in  Figure  7. 


*************************************************************** 


QUADRATIC  APPROXIMATION  OF  A  FUNCTION  USING  CHEBYCHEV 
AND  RE ME 2  A1GORITHM 

*************************************************************** 


Functions  to  be  compared  Interval 


1. 

2~x 

[0,1] 

2  . 

1/x 

[1,2] 

3  . 

sqrt (x) 

[ 1, 2] | 

4. 

1/sqrt (x) 

[1,2] 

5. 

log2 (x) 

[1,2] 

6. 

log(x)  =  ln(x) 

[1,2] 

7. 

sin (pi*x) 

[0,1/2] 

8. 

cos (pi*x) 

[0,1/2] 

9. 

tan (pi*x) 

[0,1/4] 

10. 

sqrt (-log (x) )  =  sqrt(-ln(x)J 

[1/256, 1/4] 

11. 

tan  (pi*x)  "'2  +  1 

[0,1/4] 

12  . 

- (x*log2 (x)  +  (1-x) *log2 (1-x) ) 

[1/256,  1-1/256] 

13  . 

1/ (l+exp(-x)  )  =  1/ (1+e'' (-x)  ) 

[0,1] 

14  . 

( 1/sqrt  (2 *pi)  )  *exp  (-x/'2/2 ) 

[0, sqrt (2) ] 

15. 

sin (exp (x) ) 

[0,2] 

*************************************************************** 
Input  the  Function,  f unc [sqrt (-l*log (x) ) ] : 

( 1) Non-uniform  or  (2) Uniform  Segmentation  or  (3) Both  [1]: 

Input  the  Desired  Error,  epsilon[2/'-33]  :  2/'-16 

Input  the  no.  of  pts  the  fct  is  to  be  evaluated,  N[ 1000000] : 

******************************************************** 


Figure  7.  Quadratic  approximation  user-interface  when  non-uniform  segmentation  has 

been  used. 
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If  the  user  selects  non-uniform  segmentation,  the  user  interface  allows  the  user  to 
select  whether  he/she  wants  to  specify  s  or  if  they  would  like  to  use  a  fixed  number  of 
segments  instead.  The  new  user  interface  looks  like  that  shown  in  Figure  8. 
*************************************************************** 


QUADRATIC  APPROXIMATION  OF  A  FUNCTION  USING  CHEBYCHEV 
AND  REMEZ  A1GORITHM 

*************************************************************** 


Functions  to  be  compared 

Interval 

1. 

2Ax 

[0,1] 

2  . 

1/x 

[1,2] 

3  . 

sqrt (x) 

[1,2] 

4 . 

1/sqrt (x) 

[1,2] 

5. 

log2 (x) 

[1,2] 

6. 

log(x)  =  ln(x) 

[1,2] 

7. 

sin (pi*x) 

[0,1/2] 

8. 

cos (pi*x) 

[0,1/2] 

9. 

tan (pi*x) 

[0,1/4] 

10. 

sqrt (-log (x) )  =  sqrt(-ln(x)) 

[1/256,1/4] 

11. 

tan (pi*x) A2  +  1 

[0,1/4] 

12  . 

-(x*log2(x)  +  (1-x)  *log2 (1-x) ) 

[1/256, 1-1/256] 

13  . 

1/ (l+exp(-x) )  =  1/ (l+e~ (-x) ) 

[0,1] 

14. 

( 1/sqrt  (2  *pi)  )  *exp(-x''2/2) 

[0, sqrt  (2) ] 

15. 

sin (exp  (x) ) 

[0,2] 

*************************************************************** 

Input  the  Function,  func [sqrt (-l*log (x) ) ] :  8 

( 1) Non-uniform  or  (2)Uniform  Segmentation  or  (3)Both  [1]:  2 

Would  you  like  to  constrain  (l)Number  of  Segments  or  (2)Error  [1]: 

Input  the  number  of  Desired  Segments [20] :  50 

Input  the  no.  of  pts  the  fct  is  to  be  evaluated,  N[ 1000000] : 

Figure  8.  Quadratic  approximation  user-interface  when  uniform  segmentation  has  been 

specified. 


a.  Summary  of  Advantages  and  Disadvantages  of  Uniform  and 
Non-Uniform  Segmentation 

Table  3  shows  a  summary  of  the  advantages  and  disadvantages  between 
uniform  and  non-uniform  segmentation. 
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Advantages 

Disadvantages 

Uniform 

Segmentation 

•  No  need  for  segment  index 
encoder 

•  Less  complex  hardware 

•  High  curvature  functions 
require  many  segments 
(wastes  memory) 

Non-Uniform 

Segmentation 

•  High  curvature  functions  with 
segments  that  are  as  wide  as 
possible  (Saves  on  memory) 

•  Requires  segment  index 
encoder 

•  More  complex  design 

Table  3.  Summary  of  Advantages  and  Disadvantages  of  Uniform  and  Non-uniform 

Segmentation. 


2.  Segment  Coefficients  Using  Polyfit  and  the  Remez  Algorithm 

To  obtain  the  coefficients  of  a  segment  when  segmenting  any  function,  several 
different  algorithms  may  be  used.  In  [5],  Sasao  et  al  use  the  Douglas-Peucker  algorithm 
[10]  for  segmenting  and  providing  linear  approximations  to  the  functions.  However  this 
algorithm  does  not  yield  an  optimum  segmentation  [11]. 

The  initial  work  in  this  thesis  used  the  Polyfit  [7]  function,  available  in 
MATLAB,  to  find  the  coefficients.  Polyfit  is  computationally  efficient  and  has  been 
optimized  for  MATLAB.  It  requires  a  set  of  data  points  that  represent  the  function  that 
the  user  intends  to  best  fit  a  polynomial  of  order  n.  In  this  thesis,  we  are  working  with 
quadratic  functions  and  therefore  use  n  —  2.  Polyfit  finds  the  coefficients  to  the 
approximating  polynomial  in  a  least  squares  sense  [7]  and  returns  a  row  vector  with  the 
coefficients  of  the  polynomial.  Least  squares  approximations  minimize  the  average  error 
on  the  interval  selected.  However,  the  worst-case  error  can  be  large.  That  is,  it  yields  an 
average  error  that  satisfies  the  constraint  given,  i.e.  s ,  but  the  worst-case  errors  may  still 
exceed  the  constraint. 

In  analyzing  the  approximation  polynomials  produced  from  the  coefficients 
provided  by  the  Polyfit  function,  the  graphs  showing  the  error  over  each  segment  had  the 
largest  error  at  the  begin  and  end  points  of  the  segment  as  can  be  seen  in  Figure  9  below. 
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This  graph  shows  the  weakness  in  using  least  squares  approximation  methods  like 
that  used  by  Polyfit.  Our  goal  is  to  reduce  the  number  of  segments  for  the  given  function 
in  order  to  restrain  the  maximum  error  to  no  greater  than  s .  Therefore,  Polyfit  was 
abandoned  and  instead  the  Remez  algorithm  [13]  was  used. 

The  Remez  algorithm  uses  a  method  of  approximation  that  minimizes  the  worst- 
case  error.  It  belongs  to  the  set  of  least  maximum  approximations  ( minimax 
approximations).  The  program  ensures  that  there  was  no  point  in  the  interval  where  the 
error  found  by  evaluating  the  difference  between  the  approximation  polynomial  and  the 
real  function  was  greater  than  the  constraint  given. 


x  io"5  Error  for  NON-LNIFORM  f(x)  segmentation  No  of  segs  =  14 


Figure  9. 


Quadratic  non-uniform  segmentation  approximation  error  using  Polyfit. 


The  advantage  of  the  Remez  algorithm  is  to  evenly  distribute  the  error  over  the 
segment  so  that  the  maximum  error  is  constrained  by  s .  This  can  be  clearly  observed  by 
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comparing  Figure  9  and  Figure  10.  The  function,  cos(/rx)  with  £  =  2  17 ,  was  used  in 
both  cases.  Notice  Polyfit  needed  14  segments  while  Remez  only  required  12  segments. 
Both  figures  display  only  the  first  4  segments.  The  difference  is  readily  noticeable.  Thus 
the  Remez  covers  a  larger  portion  of  the  domain  in  the  four  segments  than  Polyfit.  As  a 
result,  it  tends  to  reduce  the  number  of  segments.  In  the  Remez  implementation,  the  4th 
segment  extends  right  past  0.21  in  the  x  domain,  while  Polyfit  barely  makes  it  to  1.9. 

The  Remez  algorithm  attempts  to  achieve  the  minimax  degree-n  polynomial 
approximation  of  the  given  function  on  a  defined  interval.  In  the  program  that  was  used 
for  this  thesis,  the  interval  is  iteratively  revised  and  the  Remez  algorithm  is  repeatedly 
called  until  a  degree-2  polynomial  approximation  that  satisfies  the  constraint  is  achieved. 
The  process  is  constrained  by  £ ,  and  the  interval  is  increased  or  decreased  until  the 
optimum  segment  endpoint  lies  between  the  current  point  and  the  next  point  on  the 
domain  interval. 


Figure  10.  Quadratic  non-uniform  segmentation  approximation  error  using  Remez.  (Only 

the  first  four  segments  are  shown). 
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The  Remez  algorithm  requires  much  more  computational  time  and  effort  than  the 
Polyfit  function  (which  is  already  optimized  for  MATLAB).  In  general,  for  an/ with  an 
interval  [a,  b],  there  are  several  polynomials,  but  only  one  polynomial  p*  is  the  minimax 
degree-n  approximation.  This  approximation  will  have  at  least  n+2  points,  as  described 
in  inequality  (0.1)  that  evaluate  to  yield  an  error  that  will  be  maximum  magnitude  and 
will  alternate  in  sign. 


a  <  x0  <  x{  < ...  <  xn+l  <  b  (0.1) 

The  begin  point  and  end  point  of  the  interval  are  included.  In  the  case  of 
quadratic  approximations,  a  degree-2  polynomial  can  expect  at  least  4  points  where  the 
error  will  be  maximum  and  will  alternate  in  sign,  as  seen  in  Figure  10.  The  Remez 
algorithm  is  iterative  and  requires  an  estimate  of  the  point  where  the  error  is  maximum. 
The  Chebyshev  approximation  is  better  than  most  other  approximation  algorithms  in 
obtaining  a  polynomial  close  to  the  minimax  polynomial  p  * .  When  compared  to  Taylor 
Series,  Legrendre,  Chebyshev  provides  a  better  estimate  in  most  cases.  For  this  reason, 
Chebyshev  approximation  is  used  to  provide  a  set  of  starting  points  in  the  Remez 
algorithm  in  this  thesis.  The  previous  discussion  is  described  in  more  detail  in  [13], 

The  function  ChebyRemez  in  Appendix  B  was  written  to  implement  the  Remez 
algorithm  with  an  initial  set  of  points  where  the  error  is  maximum.  Using  Remez  slowed 
down  the  program  written  to  compute  the  coefficients;  especially  when  higher  accuracy 
was  desired  or  in  general,  when  the  x  domain  interval  was  assigned  more  points;  N.  To 
neutralize  this  effect,  different  algorithms  were  investigated  to  speed  up  the  program. 
These  are  discussed  further  in  the  section  three  below. 

3.  Algorithms  Investigated  to  Speed-Up  the  Segmentation 

In  the  program  proposed  by  Sasao,  Butler  and  Riedel  [5],  the  domain  was  divided 
into  points  and  segmentation  was  detennined  by  brute  force,  i.e.  point  by  point  to 
determine  the  required  size  of  the  segment.  To  attain  high  accuracy,  the  domain  needs  to 
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be  subdivided  into  hundreds  of  thousands  and  even  millions  of  points.  This  results  in 
slow  execution.  We  investigate  ways  to  speed  up  the  segmentation. 


a.  Brute  Force 

The  lower  value  of  the  domain  is  established  as  the  begin  point.  The 
program  steps  through  each  point  computing  the  minimax  degree-2  polynomial 
approximation  of  the  function.  When  evaluating  any  segment,  (even  two  consecutive 
points),  the  program  creates  1000  points  between  the  given  begin  point  and  the  end  point. 
This  ensures  enough  points  for  the  program  to  locate  the  points  in  the  segment  where  the 
maximum  and  minimum  error  is  achieved,  as  described  above.  The  coefficients  required 
are  then  computed  and  next,  the  approximated  polynomial  is  used  to  evaluate  all  the 
points  in  the  current  segment.  These  values  are  compared  with  the  actual  values  from 
computing  the  real  function.  The  maximum  error  is  determined.  If  the  error  is  smaller 
than e ,  the  program  steps  one  point  to  the  right  and  repeats  the  process.  Eventually,  the 
polynomial  approximation  will  produce  an  approximation  where  the  maximum  error 
exceeds  £ .  At  this  point  the  program  steps  back  one  step  and  records  the  end  point  of  the 
segment.  For  a  typical  segmentation  with  A  =  1,000,000,  this  program  takes  much  time. 
N  is  defined  as  the  number  of  points  on  the  entire  interval  of  the  domain,  i.e.  number  of 
points  on  the  interval  [a,  b\. 

b.  Binary  Search 

Binary  search  is  really  a  two  step  process: 

1 .  Locate:  A  point  close  to  the  optimum  point  is  determined. 

2.  Pinpoint:  Use  brute  force  to  move  up  to  the  optimum  point. 

In  step  1,  given  a  function  f  and  an  interval  [a,  b\,  starting  on  the  left  at  a, 
the  lower  value  of  the  domain  is  established  as  the  begin  point  and  the  end  point  is  set  to 
b.  This  is  the  entire  domain  interval  over  which  the  program  computes  the  minimax 
degree-2  polynomial  approximation.  Given  the  constraint,  s  ,  the  program  tests  the  error 
of  the  approximation  and  if  the  error  is  greater  than  the  constraint,  the  program  divides 
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the  interval  into  two  equal  parts  and  decreases  the  proposed  interval.  Figure  1 1  shows  a 
graphic  representation  of  the  first  4  iterations.  These  iterations  are  part  of  step  1;  Locate. 

The  optimum  is  endpoint  of  the  first  segment  is  labeled  x0.  Figure  11 

shows  the  first  iteration,  interval  [a,  b]  is  tested  to  determine  if  it  is  a  good  segment  size. 
Since  it  is  too  large,  the  interval  is  divided  into  2.  The  new  interval  is  [a, 
1st  proposed  x0].  The  process  is  repeated  and  the  approximation  of  this  new  proposed 
segment  is  tested  against  the  constraint.  This  is  an  iterative  process  that  decreases  the 
width  of  the  segment.  The  next  proposed  segment  is  [a,  2nd  proposed  x0  ]  as  shown  in 
Figure  11.  Again  the  segment  is  tested.  If  the  constraint  is  not  met,  the  segment  is 
decreased  by  1/2. 


NCN-UNIFCRM  f;  x  )  =  2X  s  sg-nentstion  No.  sfssg'nems  =  7. 


Figure  1 1 .  Shows  the  interval  and  segmentation  notation. 
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The  process  is  repeated  until  the  constraint  is  met.  In  Figure  11,  the 
constraint  is  met  on  the  fourth  try  and  results  in  a  proposed  segment  [a,  3rd  proposed  jc0]. 

Once  below  the  optimum  end  point,  the  program  increases  the  proposed  segment 
endpoint  until  the  constraint  is  exceeded.  This  means  the  segment  is  increased  by  half  of 
the  last  width  used  to  decrease  the  proposed  segments.  In  Figure  11,  the  last  width  was 
2nd  proposed  x0  -  3rd  proposed  x(] .  The  process  of  increasing  and  decreasing  the 

segment  size  by  widths  that  are  halved  per  iteration  is  repeated  until  the  width  being  used 
to  increment  or  decrement  is  1 .  At  this  point,  we  are  done  with  step  1  (Locate)  and  we 
move  to  step  2.  Step  2  uses  brute  force  to  Pinpoint  the  optimum  segment. 

The  binary  search  finds  the  actual  segment  end  point  in  approximately  s 
steps  as  described  by  inequality  (0.2)  where  npts  is  the  number  of  points  in  the  initial 
proposed  segment. 


5  >  1  +  log 2(npts) 


(0.2) 


Compared  to  the  number  of  steps  required  by  brute  force,  this  is  a 
dramatic  improvement.  Consider  N=  1 .000.000.  then  the  binary  search  for  the  first 
segment  should  yield  around  2 1  steps  to  find  the  optimum  segment  end  point  xO;  npts  in 
this  case  is  1,000,000.  The  number  of  steps  required  to  reach  the  segment  end  point  is 
reduced  as  the  program  progresses  to  the  end  of  the  domain  interval.  This  is  because  the 
argument  npts  in  equation  (0.2)  decreases.  In  Table  4  the  binary  search  takes  924  calls  to 
the  function  chebyRemz  as  opposed  to  the  brute  force  method  which  makes  1,000,000 
calls. 

The  number  of  calls  to  the  user  programmed  MATLAB  function 
chebyRemz  is  used  as  a  metric  for  two  reasons:  (1)  the  code  for  chebyRemz  takes  longer 
to  execute  than  any  other  piece  of  code  in  the  program  and  (2)  the  number  of  calls  to  the 
user  programmed  MATLAB  function  chebyRemz  will  vary  depending  on  what  numeric 
function  is  being  segmented.  Appendix  D  shows  a  copy  of  profile  results  [7]  that  shows 
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the  execution  time  of  each  function.  The  goal  is  to  minimize  the  number  of  calls  to 
chebyRemz,  thus  speeding  up  the  program. 

Appendix  A.2.1,  part  b  shows  the  portion  of  the  program  that  applies  this 
method.  The  file  name  is  varQuadApproxBinSearch.nl. 

Table  4  shows  the  number  of  calls  to  the  function  chebvRemez  for  9 
different  algorithms  that  were  investigated  to  speedup  the  segmentation.  The  first 
column  is  the  number  of  points  used  to  subdivide  the  domain.  The  next  9  columns  are 
the  different  algorithms  and  the  results.  Only  one  function  and  one  accuracy  was  used; 

yj-  ln(x)  and  e  =  2~17  respectively. 


N 

Binary 

Search 

Thirds 

Ratio 

1  Est 

2Est 

3Est 

Avg 

lEst 

Avg 

3Est 

Hybrid  w/ 

Thirds  & 

3Avg  *1.05 

1  M 

924 

764 

1143 

65400 

3369 

1903 

5972 

1960 

98 

100  K 

764 

640 

699 

6620 

430 

293 

697 

298 

98 

10K 

649 

529 

563 

739 

132 

127 

166 

129 

103 

1  K 

488 

429 

450 

181 

114 

120 

128 

122 

117 

Table  4.  Various  methods  show  the  number  of  calls  to  the  function  chebyRemz; 

segmentation  of^/-  ln(x)  ,  £  =  2  17  and  various  values  of  N. 


c.  Divide  by  Thirds 

A  second  program  was  implemented  that  applied  the  same  principle  as 
binary  search,  however  instead  of  taking  off  half  of  the  width,  the  program  took  off  two 
thirds  (i.e.  divide  the  remaining  width  by  three).  Therefore  this  method  is  also  a  two  step 
process: 

1 .  Locate:  A  point  close  to  the  optimum  point  is  determined. 

2.  Pinpoint:  Use  brute  force  to  move  up  to  the  optimum  point. 
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Figure  12  shows  the  segmentation  for  the  5th  segment.  The  domain 
interval  is  [a,  Z>],  we  start  the  segmentation  of  segment  5  at  the  end  the  4th  segment;  x4 . 

Step  1:  Denote  the  unsegmented  part  of  the  interval  as  [x4,  b\.  A  call  to 

the  function  chebyRemez  is  used  to  generate  a  quadratic  approximation.  This 
approximation  is  tested  to  see  if  any  points  exceed  the  constraints  .  If  the  constraint  is 
met,  then  we  have  the  final  segment.  Exit. 

Step  2:  Divide  the  initial  width  by  three;  the  new  value  is  1/3  of  the  initial 
width.  This  is  labeled  as  LI  in  Figure  12.  LI  is  now  the  new  proposed  segment  width 
and  chebyRemez  is  called  to  establish  a  quadratic  approximation  for  the  interval.  The 
point  labeledx5  is  the  optimum  segment  endpoint.  In  Figure  12,  LI  is  clearly  not  the 
optimal  width. 

Step  3:  The  program  divides  LI  by  three  and  the  result  is  L2.  A  quadratic 
approximation  is  computed  to  test  the  approximation  error  against  the  constraint.  Since 
L2  is  below  the  optimum  point,  we  initialize  a  new  variable,  delta,  to  be  used  to  keep 
track  of  the  width  which  is  being  added  or  subtracted  to  the  proposed  width  of  the 
segment,  delta  is  1/3  of  L2. 

Step  4:  Increase  L2  by  1/3  of  L2.  This  results  in  L3,  which  is  tested  to 
detennine  the  approximation  error.  In  Figure  12,  L3  is  still  short  of  the  optimum 
segment. 

Step  5:  Increase  L3  by  the  same  delta,  i.e.  1/3  of  L2.  The  approximation 
is  computed  for  the  new  proposed  segment  of  width  L4,  and  the  approximation  error 
tested  against  the  constraint.  This  time  we  have  exceeded  the  optimum  endpoint,  i.e. 
approximation  error  is  greater  than  s  .  In  Figure  12,  L4  is  larger  than  the  optimum  point. 

Step  6:  Since  we  have  exceeded  the  optimum  segment,  we  now  reduce  the 
variable  delta  to  1/3  of  delta.  This  value  is  the  used  to  reduce  L4  to  a  narrower  width,  i.e. 
L5.  In  Figure  12,  L5  is  still  wider  than  the  optimum  width. 

When  the  increment  width  is  2  or  less,  Locate  is  complete  and  the  program 
goes  to  Pinpoint.  The  process  stops  when  two  adjacent  points  straddle  the  optimum 
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segment  endpoint.  The  lower  value  isx5,  the  segment  endpoint  for  the  program.  Since 
the  domain  has  been  divided  into  discrete  points,  x5  is  just  shy  of  the  optimum  point. 

The  approximation  error  of  the  new  segment  meets  the  constraint;  however,  the  next 
point  to  the  right  of  the  optimum  point  has  an  approximation  error  that  exceeds  a  . 

The  results  showed  an  improvement  over  binary  search.  Table  4  shows 
that  the  method  of  Thirds  called  the  function  chebyRemez  764  times  as  opposed  to  the 
binary  search  method  that  took  924  calls  to  achieve  the  same  segmentation. 

Other  values  besides  one -third  were  tested,  but  they  did  not  perform 
consistently  better.  Appendix  A.2.1,  part  c  shows  the  portion  of  the  program  that  applies 
this  method.  The  file  name  is  varQuadApproxTHIRD.m 


Divide  Interval  bv  Thirds 


Start  point 


Starting;  Get  below 
segment  endpoint 


A 


Loop  to  converge  on 
segment  endpoint 


4t 


(D 

© 

© 


© 


Optimum  endpoint 


Interval 

end 


,JCc 

V  5 


Initial  length 

— *  LI  =  1/3  of  Initial  length 
L2  =  1/3  of  LI;  initialize  delta=L2/3 


L3  =  L2  +  delta; 
L4  =  L3  +  delta; 
L5  =  L3  -  delta; 


delta  =  delta/3 


Figure  12.  Visual  aid  for  description  of  divide  by  thirds  algorithm. 


d.  Increment  by  Ratio  Numbers 

In  this  method,  the  width  of  the  proposed  segment  is  increased  or 

decreased  by  multiplying  the  current  proposed  width  by  a  series  of  fixed  values.  We 

have  the  same  2-step  process  of  Locate  and  Pinpoint. 
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In  Locate,  the  proposed  width  is  the  entire  remaining  width  of  the  domain 
interval  [a,  h]  i.e.  the  width  from  point  a  to  point  b.  The  width  is  tested  to  see  if  the 
constraint  has  been  exceeded  or  not;  except  for  the  last  segment,  the  width  will  always 
exceed  the  optimum  segment  because  the  entire  remainder  of  the  interval  is  used  per 
iteration.  As  an  example,  consider  that  the  first  segment  [a,  x0]  is  already  established 

(segment  [a,  x0  ]  as  shown  in  Figure  1 1).  Next,  the  program  needs  to  compute  the  second 
segment.  The  program  will  establish  a  proposed  width  [jc0,  b\  This  is  the  entire 
remainder  of  the  interval.  The  ratios  are  applied  to  the  width  [,r0,  b\.  The  result  is 
shorter  widths  that  are  tested  until  the  constraint  is  met.  This  method  is  similar  to  the 
method  “ Divide  by  Thirds,  ”  except  that,  a  set  of  ratios  are  applied  to  the 
increment/decrement  width. 

Table  4  shows  the  implementation  of  increment  by  ratio  numbers  took 
1143  calls  to  chebyRemz  function.  Appendix  A.2.1,  part  d  shows  the  portion  of  the 
program  that  applies  this  method.  The  file  name  is  varQuadApproxRatio.m 

e.  Estimated  Segment  Widths  (1,  2,  3,  more  and  Average) 

Again,  the  2-step  process  of  Locate  followed  by  Pinpoint  is  applied  here. 
In  Locate,  an  estimate  of  the  segment  is  calculated. 

Equation  (0.3)  is  adapted  from  [15]  to  compute  segment  estimates  for 
quadratic  approximations.  The  derivation  is  in  Appendix  F.  The  accuracy  £,  and  the 
third  derivative  of  the  function  used  to  estimate  the  width  of  the  segments.  The  proposed 
segment  widths  are  tested  and  the  program  falls  back  on  the  brute  force  method  after  the 
initial  estimate.  This  yields  a  large  improvement  from  using  the  brute  force  method 
alone. 

i 
3 

(0.3) 


EstSegLen  =  4 


38 


d3y/ 

/  dx 3 

max 
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One  Estimate:  In  Table  4,  when  one  estimate  is  used,  i.e.  the  third 
derivative  is  computed  atx  =  begin  point  of  the  segment.  The  estimated  width  is  added  to 
the  begin  point  and  the  proposed  segment  is  tested.  The  brute  force  method  takes  over 
and  single  steps  to  the  optimum  segment  width.  The  result  was  65,400  calls  to 
chebyRemez. 

Two  Estimates:  The  first  estimated  width  is  calculated  using  equation 
(0.3)  and  the  third  derivative  is  computed  at  the  begin  point  of  the  segment.  The  resulting 
estimated  width  is  added  to  the  begin  point  and  the  resulting  endpoint  is  used  in  equation 
(0.3)  to  make  a  second  estimated  width  using  the  third  derivative  at  the  endpoint.  The 
average  of  these  two  widths  is  the  estimated  width  that  is  applied  to  the  begin  point  to 
obtain  a  proposed  endpoint.  Again,  the  program  uses  the  brute  force  method  to  complete 
the  segmentation.  This  method  improved  the  perfonnance  and  took  3369  calls  to 
chebyRemez. 

Three  Estimates:  Two  estimates  are  computed  as  described  above.  The 
result  is  divided  in  half  the  half-way  point  is  used  to  compute  the  third  estimate.  The 
third  estimate  is  averaged  with  the  other  two  estimated  widths  to  obtain  the  proposed 
segment  width.  As  in  the  other  two  cases,  the  brute  force  method  is  then  applied  to 
complete  the  segmentation.  Even  further  improvement  was  achieved;  1903  calls  to 
chebyRemez. 

Estimates  with  more  than  three  widths  were  tested,  but  the  performance 
began  to  degrade.  So,  an  average  was  applied  to  the  segments. 

Average  of  one  estimate:  In  the  average  method,  one  estimate  was 
computed  from  the  begin  point.  The  estimate  was  used  to  define  a  proposed  segment. 
The  entire  set  of  points  on  this  proposed  width  are  evaluated  using  equation(0.3).  Then, 
the  mean  of  the  resulting  vector  of  estimated  widths  was  computed  and  used  as  the 
proposed  segment  width.  The  result  appeared  to  be  similar  (not  exact)  to  taking  two 
estimates  (when  multiple  functions  are  tested,  on  average  the  results  of  two  estimates  and 
the  average  method  are  similar).  Table  4  shows  that  this  method  called  chebyRemez  5972 
times. 
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Average  of  three  estimates:  This  method  is  a  combination  of  taking  three 
estimates  as  described  above.  All  the  points  on  the  proposed  width  are  evaluated  with 
equation  (0.3).  This  creates  a  vector  of  proposed  estimates.  Next  evaluate  the  mean  of 
the  vector  of  proposed  estimates  to  get  one  estimate.  The  results  of  this  method  are 
similar  to  taking  three  estimates.  However,  since  we  evaluate  all  the  points  on  the 
interval,  it  takes  slightly  longer. 

In  [15],  a  comparison  was  made  to  show  the  benefit  of  three  estimates 
over  two  estimates  and  one  estimate  in  the  case  of  linear  approximation.  While  it  is  not 
discussed  in  [15],  one  estimate  was  computed  in  the  linear  approximation  and  the 
resulting  proposed  width  was  used  to  compute  the  mean  of  all  the  estimates  obtained 
from  evaluating  all  the  points  on  the  proposed  width.  The  mean  of  the  estimates  was 
similar  to  taking  the  mean  of  just  two  estimates  (begin  point  and  proposed  endpoint).  In 
the  quadratic  case,  the  same  method  yielded  results  that  were  comparable  to  taking  the 
mean  of  two  estimates,  just  like  the  results  in  the  linear  case.  However,  when  the  mean 
of  three  estimates  was  used  to  define  a  proposed  segment  and  the  average  of  all  the 
estimates  on  the  newly  proposed  width  was  computed,  the  result  was  very  close  to  taking 
the  mean  of  just  three  estimates. 

Closer  analysis  revealed  that,  in  many  cases,  the  average  of  all  the  points 
worked  well  and  sometimes  even  better  than  just  the  mean  of  three  individual  estimates. 
The  results  appear  in  Table  5.  The  first  column  is  the  suite  of  numeric  functions 
represented  by  a  number;  the  focus  should  be  on  the  comparison,  not  any  particular 
numeric  function.  The  second  column  is  just  the  three  estimates  as  described  above,  the 
third  column  is  the  average  of  the  estimates  calculated  using  all  points  on  the  proposed 
segment.  The  fourth  column  is  the  difference  between  the  second  and  third  column.  The 
last  column  is  a  method  described  in  part  f;  Hybrid  of  Thirds  and  three  Estimates.  Table 
5  shows  that  taking  the  average  of  all  estimates  on  the  segment  has  a  slight  advantage 
over  taking  the  average  of  just  three  single  estimates.  Therefore,  looking  back,  Table  4 
used  only  one  numeric  function,  and  that  made  it  appear  that  the  method  of  3Avg  was 
slightly  worse,  whereas  in  Table  5,  we  can  see  that  the  when  applied  to  the  entire  suite  of 
functions,  the  average  over  the  entire  segment  (which  was  selected  after  three  estimates), 
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was  slightly  better.  The  values  at  the  bottom  are  the  sum  of  ah  the  calls  to  the 
approximating  algorithm,  chebyRemez,  which  was  the  metric  used  to  determine  the 
comparative  speed  of  the  program. 


Function  by  Numbers 

3Est 

3Avg 

Comparison 

(3Est  -  3Avg) 

*1.05  Hyb  3Avg 

1 

23 

29 

-6 

20 

2 

93 

103 

-10 

29 

3 

148 

146 

2 

14 

4 

133 

145 

-12 

23 

5 

83 

84 

-1 

26 

6 

90 

95 

-5 

23 

7 

266 

87 

179 

59 

8 

6326 

6210 

116 

61 

9 

128 

92 

36 

35 

10 

293 

298 

-5 

98 

11 

6233 

6203 

30 

65 

12 

925 

581 

344 

172 

13 

230 

81 

149 

39 

14 

7378 

7203 

175 

95 

15 

650 

963 

-313 

222 

SUM 

22999 

22320 

679 

981 

Table  5.  Comparison  of  “3  estimates”,  mean  of  ah  estimates  computed  on  proposed 
segment  that  was  calculated  after  taking  3  estimates;  “3  average”  and  a  hybrid 
that  exaggerates  the  approximation  error  by  5%.  Ah  cases,  N=  100,000 

and.?  =  2~17 . 


The  next  question  is;  should  we  use  just  three  estimates  or  should  we  use 
the  average  of  ah  the  estimates  computed  from  ah  the  points  on  a  proposed  segment? 
The  difference  is  small.  The  impact  of  the  additional  code  that  takes  the  average  of  an 

29 


entire  segment  did  not  exceed  the  time  taken  by  chebyRemez  and  did  not  significantly 
impact  the  computing  time  of  the  program. 

The  additional  code  does  not  take  add  significantly  to  the  program  and 
since  it  has  advantages,  we  kept  the  program  that  averages  the  estimates  over  the  entire 
segment.  The  analysis  to  support  that  decision  follows:  Consider  the  small  section  of  a 
Profile  report  from  MATLAB  that  is  similar  to  the  one  in  Appendix  D. 

Table  6  shows  the  total  time  for  varQuadApprox  implemented  with  only 
three  estimates.  The  time  for  the  function,  including  all  child  functions  is  44.438s.  These 
values  come  from  running  the  program  with  the  function -(x  log2x  +  (1-x)  log2 (1-x)) , 

A=  1,000, 000  and  £  =  2~33. 


Profile  Summary 

Generated  21-Aug-2007  22:25:40 

Function  name  Calls  Total  Time  Self  Time*  Total  Time  Plot 

(dark  band  =  self  time) 


multinleOuad  Approx 

1 

44.906  s 

0.156  s 

■ 

varOuad  Approx  Hyb3  EstThi  rd 

1460 

44.438  s 

3.516  s 

■ 

chebvRemz 

13187 

39.156  s 

16.406  s 

inline.subsref 

87050 

20.031  s 

3.031  s 

1 

inlineeval 

87202 

17.031  s 

17.031  s 

polyval 

69483 

3.828  s 

3.359  s 

i 

twosComp 

5840 

3.000  s 

0.188  s 

Table  6.  Profile  Report  for-(x  log2x  +  (1-x)  log2(l-x)),  N=l, 000, 000  and.?  =  2  33 .  Shows 
44.438s  for  the  varQuadApprox  function  that  averages  only  three  estimates. 


The  same  function  and  parameters  were  run  with  the  additional  code  that 
takes  the  average  of  all  estimates  over  the  entire  segment.  The  results  appear  in  Table  7. 

The  total  time  for  varQuadApprox,  and  all  its  child  functions  is  20.078s.  The  additional 

30 


code  to  compute  the  averages  took  0.061s  which  translates  to  less  than  1%  of  the  time 
spent  in  varQuadApprox.  Therefore,  the  additional  code  is  negligible.  This  particular 
function  clearly  shows  the  advantage  of  taking  the  average;  greater  than  50% 
improvement  (44s  to  20s). 

It  should  be  noted,  that,  in  a  few  cases,  the  improvement  was  not  as 
dramatic  and  in  yj-  ln(x)  ,  the  average  code  perfonned  worse  by  20%  (20  seconds  to  25 
seconds).  However,  on  average,  it  was  better  to  take  the  average  over  the  entire  segment. 

A  slightly  different  problem;  what  happens  when  the  third  derivative  is 
zero?  This  presents  a  problem  in  the  computation  of  estimates  (the  third  derivative  is  in 
the  denominator  of  equation(0.3)).  Therefore,  one  way  to  tackle  the  problem  is  to  find 
the  smallest  non-zero,  third  derivative  magnitude  over  the  entire  domain  interval  \a,  b\ 
and  use  that  to  calculate  the  largest  expected  segment.  This  large  segment  is  substituted 
whenever  the  third  derivative  is  zero.  In  many  cases,  the  resulting  estimate  is  a  poor 
estimate  of  the  segment  size,  and  tends  to  slow  down  the  program  when  encountered. 
Therefore,  a  hybrid  of  the  best  segmentation  processes  was  used  and  is  described  below. 
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varQuadApproxHyb3AvgThird  (1462  calls,  20.078  sec) 

Parents  (calling  functions) 

Filename  File  Type  Calls 

multipleQuadApprox  M-function  1462 


Lines  where  the  most  time  was  spent 


Line  Number 

Code 

Calls 

Total 

Time 

% 

Time 

Time 

Plot 

98 

[p, oscil, errP]  = 
chebyRemz (fct. . . 

1462 

4.531  s 

22.6% 

- 

194 

[p, oscil, errP]  = 
chebyRemz (fct. . . 

994 

3.375  s 

16.8% 

- 

209 

[p, oscil, errP]  = 
chebyRemz (fct. . . 

1001 

3.141  s 

15.6% 

- 

182 

[p, oscil, errP]  = 
chebyRemz (fct. . . 

945 

2.859  s 

14.2% 

- 

133 

[p, oscil, errP]  = 
chebyRemz (fct. . . 

1010 

2.719  s 

13.5% 

- 

Other  lines  & 
overhead 

3.453  s 

17.2% 

- 

Totals 

20.078  s 

100% 

< 

0.01 

1461 

79 

if  len+indx  >  length (x  pts) 

80 

len  =  length (x  pts)  -  indx; 

81 

end 

0.61 

1461 

82 

Der3Intr  =  f3der (x  pts ( indx : indx+len) ) ; 

%  Gg  t 

0.03 

1461 

83 

AV3DER  =  mean  (Der3Intr) ; 

"6 

< 

0.01 

1461 

84 

x  range  =  4 * (epsilon*3/abs (AV3DER) ) A ( 1/3 ) ; 

r  %  Get 

< 

0.01 

1461 

85 

len  =  round  (x  range/ (x  ptsRange) *length (x 

Pts) ) ; 

< 

0.01 

1461 

86 

if  len+indx  >  length (x  pts) 

Table  7.  Profde  Report  for-(x  log2x  +  (1-x)  log2(l-x)) ,  A=1,000,000  and£  =  2  33 .  Shows 

20.078s  for  the  varQuadApprox  function  and  0.061s  for  the  average  of  all  the 

estimates  on  the  entire  segment. 


f.  Hybrid  of  Thirds  and  3  Estimates 

In  this  algorithm,  we  take  advantage  of  the  strengths  of  two  programs.  As 
with  the  other  algorithms,  we  have  a  Locate  and  Pinpoint  step.  However,  Locate  is  a 
combination  of  Divide  by  Thirds  and  3  Estimates. 
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We  know  that  e  is  the  constraint  and  that  when  the  approximation  is 
good,  then  a  ratio  of  the  maximum  approximation  error  to  s  should  be  very  close  to  1.0. 
This  ratio  can  be  used  as  a  metric  to  determine  the  quality  of  our  estimate.  If  the  ratio  is 
much  larger  than  1.0,  because  the  segment  is  too  large,  then  our  estimate  is  too  wide.  If  it 
is  much  less  than  1.0,  our  estimate  is  too  small. 

To  take  advantage  of  the  ratio  of  approximation  error  and£  ,  the  program 
first  takes  the  average  of  the  three  estimates  and  using  the  estimated  width,  computes  the 
approximation  error.  If  the  ratio  of  the  approximation  error  to  s  is  large  (greater  than 
1.002)  or  small  (less  than  0.9)  the  program  takes  the  estimated  width  as  a  starting  width. 
The  program  then  takes  a  small  fraction  of  that  width  (5%)  and  stores  it  in  a  variable  that 
is  used  to  decrease  or  increase  the  proposed  width.  The  algorithm  used  is  Divide  by 
Thirds. 

In  addition  to  the  steps  taken  above,  the  program  was  modified  to 
exaggerate  the  error  calculated  from  the  approximation.  This  only  happens  in  the  final 
steps  when  trying  to  Pinpoint  the  end  of  the  segment.  This  has  two  effects: 

(1)  It  drastically  reduces  the  number  of  steps  required  because  many  of 
the  estimations  fall  short  and  by  exaggerating  the  error  when  the  segment  falls  short,  you 
reduce  the  distance  that  Pinpoint  has  to  travel  to  exceeds  .  If  you  combine  the  effect  of 
saving  two  or  three  steps  per  segment,  it  adds  up  to  100  steps  if  the  segmentation 
produces  33  segments. 

(2)  Exaggerating  the  approximation  error  has  the  effect  of  making  some 
of  the  segments  slightly  smaller  than  they  would  otherwise  be  if  the  approximation  error 
were  not  adjusted.  However,  remember  that  the  final  segment  is  usually  truncated  and 
therefore  can  absorb  the  extra  space  created  by  making  the  previous  segments  narrower. 
In  a  way,  by  decreasing  the  size  of  the  each  segment  by  a  small  amount,  it  builds  in  a 
little  slack  per  segment  because  the  approximation  error  is  slightly  smaller  than  e .  The 
truncated  segment  is  not  optimized  and  can  be  increased  to  accommodate  the  small 
adjustments  in  all  the  other  segments.  Only  in  the  very  high  precision  segmentation  do 
the  segments  increase  noticeably.  The  increase  is  on  the  order  of  single  digits  when 
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considering  hundreds  or  thousands  of  segments.  This  compromise  is  acceptable  because 
it  dramatically  reduces  the  number  of  calls  to  chebyRemez  as  shown  in  the  last  columns 
of  Table  4  and  Table  5.  Further,  it  does  not  increase  the  segments  by  any  significant 
amount. 

This  hybrid  method  produces  by  far  the  best  solution  among  all  the 
algorithms  discussed.  Consider  the  function,  ^J-  ln(x)  ,  as  shown  in  Table  4,  only  98  calls 

to  chebyRemez  were  needed  to  achieve  segmentation,  which  is  0.0098%  of  the  steps  that 
brute  force  would  take  when  N=1,000,000. 

C.  MATLAB  RESULTS 

MATLAB  was  used  to  segment  the  numeric  functions  into  piecewise  quadratic 
segments.  The  uniform  and  non-uniform  segmentation,  number  of  segments  required  for 
each  of  the  numeric  functions  and  a  comparison  of  the  segmentation  algorithms  have 
been  discussed  in  part  B  above. 

The  coefficients  that  represent  the  piecewise  quadratic  approximation  for  the 
segments  are  computed  and  stored  in  a  file.  These  files  can  store  the  coefficients  and 
segment  boundaries  in  hexadecimal,  binary  or  decimal  form.  The  NFG  implemented  in 
the  floating  point  number  representation,  uses  the  coefficients  saved  as  decimal  values. 
However,  when  the  NFG  is  in  fixed  point  number  system,  the  coefficients  saved  are 
hexadecimal  values. 

Table  8  shows  the  data  in  the  memory  file  for  the  non-unifonn  segmentation  of 
cos(/rx) .  At  the  top  of  the  memory  files  is  a  decimal  number  that  states  the  number  of 
segments  in  the  memory  file.  This  is  useful  when  reading  the  file  to  determine  how  many 
elements  need  to  be  read  into  the  program. 
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460 

0.004610004610 

0.007928007928 

0.010830510831 

0.013492513493 

0.015989015989 


-4.934645942292 

-4.933831217369 

-4.932649394425 

-4.931191898804 

-4.929503741104 


-0.000000373180 

-0.000007964422 

-0.000026748228 

-0.000058351444 

-0.000103932118 


1.000000000116 

1.000000018030 

1.000000092899 

1.000000264447 

1.000000572352 


460 

0x000000970f858467  Oxf f fd885d859242 6b  Oxf ff fff ff fcde9al6  0x0000800000003f f 4 
0x00000103c8f362f9  Oxf ffd887837fab57d  Oxf ff fff ffbd3088dl  0x00008000002 6b81 4 
0x000001 62e4e8e87 3  Oxf f fd889ef ld427ca  Oxf ff fff ff If 9e9e52  0x0000800000c77ff 1 
0x000001balf681879  Oxf f fd88ceb4302ae2  Oxf ff fff fel6833aaf  0x000080000237e533 
0x0000020bed96624f  Oxf ffd8906057b39bl  Oxf ff fff fc982779e8  0x0000800004cdldcl 

Table  8.  Sample  memory-files  (Decimal  and  Hexadecimal).  Non-unifonn  segmentation 

of  cos(/rx)  ,  N=  1 ,000,000  and  s  =  2“33 . 


The  first  column  shows  the  segment  end  points.  The  next  three  columns  are  the 
coefficients  of  the  quadratic  polynomial  that  determines  values  in  the  segment.  The  order 
is  c2,  Cj  and  c0  from  left  to  right.  Equation  (0.4)  shows  the  relationship  of  the 
coefficients  to  the  polynomial. 

fix')  =  p  =  c2x2  +  c'x  +  c0  (0.4) 


The  hexadecimal  values  in  Table  8  use  a  fixed  point  number  system,  where  the 
first  17  bits  are  the  integer  including  a  sign  bit  and  the  last  47  bits  are  the  fraction.  The 
number  is  a  two’s  complement  number.  The  number  system  is  discussed  in  section  III. 


D.  SUMMARY 

MATLAB  is  used  to  segment  the  suite  of  functions  in  Table  1.  The  segmentation 
algorithm  results  in  the  fewest  segments  for  a  given  accuracy  constraint.  In  each  segment 
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the  minimax  quadratic  approximation  is  achieved  by  computing  the  coefficients  using  the 
Remez  algorithm  which  performs  better  approximation  than  MATLAB’s  available 
function;  Polyfit 

The  Remez  algorithm  is  slow;  therefore  various  methods  were  investigated  to  find 
an  efficient  algorithm  to  compute  the  segmentation  of  the  numeric  functions.  A  hybrid  of 
three  algorithms  is  chosen  as  the  best  algorithm  to  compute  fast  segmentation  of  the  suite 
of  functions.  Table  4  uses  only  one  function,  but  summarizes  the  results  of  the 
comparisons. 

Quadratic  segmentation  at  high  accuracy  ( 2  33  )  results  in  over  96%  fewer 
segments,  compared  to  linear  approximation  as  shown  in  Table  2. 

The  segmentation  is  the  first  step  to  building  the  NFG.  Next  the  circuit  has  to  be 
designed  in  hardware.  In  section  III,  we  look  at  the  components  that  make  up  the  NFG 
circuit. 
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III.  NFG  CIRCUIT 


A.  CIRCUIT  OVERVIEW 

Figure  1  is  duplicated  here  from  section  I  for  convenience.  Figure  1  shows  three 
multipliers,  the  segment  index  encoder,  coefficients  table  and  one  3 -input  adder.  These 
are  the  hardware  components  for  the  NFG. 


Input  X 


Figure  1.  NFG  Overview  (duplicated  from  Section  I). 
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The  architecture  has  three  64  bit  multipliers  and  one  3 -input  64  bit  adder.  The 
adder  and  multiplier  can  be  implemented  in  two’s  complement  or  floating  point  by  using 
the  prescribed  math  operators.  To  generate  a  floating  point  multiplier  or  adder,  the 
operands  need  to  be  declared  as  doubles  or  floats.  To  generate  a  two’s  complement 
multiplier  or  adder,  each  operand  needs  to  be  declared  as  an  integer,  e.g.  int64_t  or  int. 

The  segment  index  encoder  is  designed  using  a  priority  selector  macro  supplied 
by  SRC  and  provided  as  a  user  callable  macro.  In  unifonn  segmentation,  multiplying  by 
a  segment  density  number  can  obtain  the  desired  index. 

1.  Number  System 

To  determine  the  number  system  to  use,  we  need  to  know  the  range  of  values  the 
NFG  will  have  to  handle.  An  analysis  of  the  domain,  range  and  coefficients  provides  the 
boundaries  for  the  number  system. 

Table  9  shows  the  analysis  of  the  numeric  functions.  The  numeric  functions  have 
been  ordered  to  show  the  most  demanding  to  the  least  demanding.  At  the  top,  ln(x) 
requires  15  bits  to  accommodate  any  integer  value  the  hardware  may  encounter,  based  on 
the  range  of  values  and  coefficients. 

The  columns,  Max  and  Min  are  the  maximum  and  minimum  values  among  all 
coefficient  values,  all  possible  domain  and  range  values,  i.e.  any  number  that  would 
appear  in  the  computation  done  by  the  NFG.  The  column  labeled  Iog2  (abs(largest  one)) 
is  obtained  by  comparing  the  absolute  value  of  Max  and  Min  and  choosing  the  larger. 
We  then  compute  the  logarithm  base  2  of  this  value.  The  final  column  shows  the 
maximum  number  of  bits  required  to  represent  the  largest  possible  integer  the  NFG  may 
encounter.  Note  that  these  values  have  been  computed  for  a  specific  domain  and 
different  domains  may  require  more  or  less  bits.  Table  2  shows  the  domains  for  each  of 
the  numeric  functions  that  appear  in  Table  9. 

The  NFG  requires  at  least  15  bits  to  represent  the  largest  integer  that  may  be 
encountered  when  computing  the  approximation  of  a  numeric  function.  Therefore,  the 
number  system  chosen  is  16  bit  integer  and  16  bit  fraction  (i.e.  32  bit  implementation).  A 
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64  bit  implementation  has  32  bit  integer  and  32  bit  fraction.  The  decimal  point  in  the 
two’s  complement  number  system  is  interpreted  to  be  between  bit  32  and  bit  31  in  a  64 
bit  number  when  the  LSB  is  0. 

The  64  bit  implementation  benefits  from  using  a  16  bit  integer  and  48  bit  fraction, 
however  the  number  of  segments  required  is  very  large  and  these  implementations  were 
not  investigated  in  detail.  As  an  example,  cos(/rx)  at  s  =  2  49  and  N=5,000,000  would 
require  19,167  segments. 


Function 

Max 

Min 

log2 

(abs(largest 

one)) 

Number  of  bits 
Required 

V-lnO) 

24047.26212 

-196.4301496 

14.55358503 

15 

-(x  log2x  +  (l-x)log2(l-x)) 

360.5900787 

-185.0149295 

8.494215892 

9 

tan2  (7t  x)  + 1 

78.89563478 

-26.88144904 

6.301873574 

7 

sin(e') 

94.22597144 

-96.6450472 

6.594623895 

7 

tan(;rx) 

19.70724959 

-3.570442576 

4.300654538 

5 

ln(x) 

4.934751084 

-4.934751014 

2.302977315 

3 

sin(;rx) 

1.569925541 

-4.934645908 

2.302946566 

3 

cos(^x) 

1.569925541 

-4.934645908 

2.302946566 

3 

1/x 

2.997676487 

-2.995354324 

1.583844694 

2 

log2(x) 

2.882537585 

-2.162615784 

1.527339419 

2 

2X 

1.093679242 

0.004061004 

0.129189682 

1 

\ fx 

2 

-0.124634328 

1 

1 

1  /  -J~x 

2 

-1.247861112 

1 

1 

1  f 

1.414213562 

-0.414997832 

0.5 

1 

1 

1  +  e~x 

1 

-0.045379009 

0 

0 

Table  9.  Maximum  and  minimum  values  encountered  for  each  function  in  the  NFG 

computation.  Last  column  is  the  number  of  bits  required  for  the  integer  portion. 
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2.  16,  32,  64  Bit  Accuracy  vs.  16,  32,  64  Bit  Architecture 

The  accuracy  and  architecture  can  be  built  to  match  each  other.  Consider  a  set  of 
values  of  16  bit  accuracy.  Based  on  the  number  system,  we  would  need  16  bits  for 
integer  and  16  bits  for  fraction  (which  is  the  accuracy).  An  architecture  that  matches 
these  needs  has  to  have  32  bit  words;  the  architecture  would  be  32  bits.  One 
implementation  in  the  NFG  was  designed  this  way.  Another  design  was  built  with  32  bit 
accuracy  (32  bits  fraction  and  32  bits  integer)  and  therefore  the  width  of  the  architecture 
is  64  bits. 

Another  way  to  build  the  NFG  is  to  use  64  bit  architecture  for  all  accuracies.  This 
means  that  all  values  will  be  represented  in  64  bits.  Consider  a  value  that  is  accurate  to 
16  bits.  In  this  case,  32  bits  are  available  to  represent  the  fraction,  but  the  fraction  will 
only  be  accurate  to  16  bits.  The  rest  of  the  bits  are  irrelevant,  but  the  hardware  operates 
on  all  64  bits.  The  architecture,  in  this  case,  does  not  match  the  accuracy. 


B.  CIRCUIT  COMPONENTS 

1.  Segment  Index  Encoder 

The  segment  index  encoder  accepts  input  (x)  values  (within  the  domain  of  the 
NFG)  as  inputs  and  outputs  a  number  used  to  obtain  the  quadratic  coefficients.  The 
number  is  an  index  to  the  segment  that  x  belongs.  This  only  applies  to  the  non-uniform 
segmentation. 

User  callable  macros  available  in  the  SRC  are  used  to  implement  a  priority 
selector  in  the  NFG.  The  prioritized  selectors  work  as  an  “if-else-if’  sequence.  A  wide 
number  of  options  are  available  for  8,  16,  32  and  64  bit  wide  values.  Each  of  these  bit 
widths  options  can  be  implemented  with  4,  8,  16,  32,  64,  128  or  256  elements.  For 
example,  choosing  64  bits  and  256  elements,  is  equivalent  to  a  priority  encoder  of  256  64 
bit  words. 

The  prioritized  selector  requires  a  Boolean  condition  and  an  assignment  for  a  true 
condition.  In  the  NFG,  the  Boolean  condition  is  the  comparison  of  the  segment  endpoint 
to  the  input  value  (numeric  function  argument;  x).  If  x  is  less  than  the  segment  endpoint, 
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then  x  belongs  to  that  segment  and  the  corresponding  assignment  value  is  the  index  of  the 
segment.  Since  x  lies  in  the  chosen  segment,  the  index  of  the  segment  is  used  to  access 
the  polynomial  coefficients  that  approximate  the  numeric  function  in  that  segment. 

The  types  of  selectors  for  a  given  segmentation  are  carefully  chosen  so  as  not  to 
use  more  FPGA  area  than  necessary.  For  example,  consider  a  numeric  function  that  has 
been  segmented  into  48  segments.  The  only  selector  that  would  accommodate  this 
number  of  segments  is  the  64  element  selector  or  greater.  The  64  element  selector  can 
handle  another  16  elements.  However,  since  we  do  not  need  them,  the  whole  selector 
wastes  48  elements.  A  better  approach  is  to  make  two  smaller  selectors  out  of  one  16 
element  selector  and  one  32  element  selector.  This  saves  FPGA  area  and  allows  us  to 
build  the  selector  we  need.  An  example  of  the  described  code  is  provided  in  Table  10 


1  //--Select  Which 

Switch  Statement  will  be  executed - // 

if  (  varx  <=  0.333333333333333310) 

sel  =  1; 

else  if  (  varx  < 

=  0.500000000000000000) 

sel  =  2; 

// - 

switch  (sel) 

{ 

case  1 : 

-Switch  Statement - // 

select  pri 

64bit  32val (  varx  <=  0.010351035103510351, 

0, 

varx  <=  0.020802080208020803, 

1, 

varx  <=  0.031203120312031204, 

2, 

varx  <=  0.322882288228822870, 

30, 

\ — 1 
00 

&indx) ; 

break; 

case  2 : 

select  pri 

64bit  1 6val (  varx  <=  0.343734373437343750, 

32, 

varx  <=  0.354135413541354140, 

33, 

varx  <=  0.364586458645864590, 

34, 

varx  <=  0.479147914791479170, 

45, 

varx  <=  0.489598959895989620, 

46, 

47, 

&indx) ; 

break; 

} 

Table  10.  Code  that  uses  two  selectors  to  implement  48  segments. 
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To  implement  a  larger  than  256  selector,  a  combination  of  available  selectors  can 
be  used.  In  the  .me  file,  an  if-else-if  statement  precedes  the  set  of  selectors  and  selects 
which  one  of  the  selectors  will  be  used  to  encode  the  index. 

More  detail  on  the  various  selectors  available  in  the  SRC,  is  provided  in  Appendix 
A.10  of  [17]. 


2.  Indexing  in  Uniform  Segmentation 

In  uniform  segmentation,  a  number  that  is  multiplied  by  the  input  value,  x,  is  used 
to  compute  the  appropriate  segment;  essentially,  a  segment  number  density.  It  represents 
the  number  of  segments  per  unit  length.  Instead  of  a  segment  index  encoder,  x  is 
multiplied  by  the  segment  density  number  and  the  integer  result  is  the  index  that  is 
applied  to  the  coefficients’  arrays  to  access  the  coefficients  for  the  quadratic 
approximation. 

The  segment  density  number  is  obtained  by  dividing  the  entire  interval  by  the 
number  of  segments  and  inverting  the  result. 


For  example,  consider  an  interval,  [0,  0.5]  with  uniform  segmentation.  If  100 


segments  are  realized,  then  the  number  used  to  multiply  all  inputs  is 


^  0.5-1^  1 


100 


=  200.  If 


v  J 

the  input  is  0.3356,  then  the  coefficients  will  be  extracted  from  the  OBM  array  using  the 
index  67 ( floor  (0.3356 x  200  =  67. 12)  =  67) . 


If  the  interval  of  the  domain  starts  at  a  non-zero  value,  then  the  index  obtained 
from  the  above  method  will  be  offset.  Simply  subtract  the  offset  from  the  index  obtained 
to  get  the  true  index  into  the  array.  This  extra  step  increases  the  pipeline  depth  of  the 
NFG.  The  effect  is  greater  in  floating  point  implementation  compared  to  fixed  point 
implementation. 
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a.  Floating  Point  Implementation 

The  uniform  segmentation  of  the  NFG  in  floating  point  requires  three 
files;  main.c,  <subroutine> .me  and  memoryFile.  An  array  containing  floating  point 
values  of  the  endpoints  and  coefficients  of  the  unifonn  segmentation  are  passed  into  the 
OBM,  via  a  DMA  call.  The  sample  points  for  testing  the  NFG  are  placed  in  a  separate 
array  and  passed  into  OBM  via  a  second  DMA  call.  The  memory  file  contains  three 
numbers  at  the  beginning  of  the  file: 

•  The  number  of  segments  (which  is  also  the  number  of  sets  of 
coefficients  in  the  memory  file).  Stored  as  an  int. 

•  The  segment  density  number  that  is  used  to  determine  the 
segment  that  any  x  input  belongs  to.  Stored  as  a  double. 

•  The  offset  value  (needed  for  functions  that  have  an  interval 
with  a  non-zero  begin  point) 

b.  Fixed  Point  Implementation 

The  uniform  segmentation,  fixed  point  implementation,  works  similar  to 
the  floating  point  implementation.  Three  files  are  needed;  main.c,  <subroutine> .me  and 
memoryFile.  The  coefficients  in  the  memory  file  and  in  the  computation  are  two’s 
complement  hexadecimal  numbers,  as  described  in  the  section  on  number  systems.  The 
memory  file  contains  three  numbers  at  the  beginning  of  the  file: 

•  The  number  of  segments  (which  is  also  the  number  of  sets  of 
coefficients  in  the  memory  file).  Stored  as  an  int. 

•  The  segment  density  number  that  is  used  to  determine  the 
segment  to  which  any  x  input  belongs.  Stored  as  an  int64_t. 

•  The  offset  value  (needed  for  functions  that  have  an  interval 
with  a  non-zero  begin  point) 

The  computation  of  the  index,  and  therefore,  the  segment,  is  accomplished 
in  two’s  complement.  One  major  problem  exists  in  this  multiplication;  the  product  is  128 
bits,  but  the  architecture  only  allows  64  bits  to  be  stored.  This  means  the  upper  64  bits 
are  truncated.  In  addition,  since  the  decimal  point  in  the  operands  is  32  bits  from  the 
LSB,  the  decimal  point  in  the  product  is  between  bit  63  and  bit  64  (when  LSB  is 
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considered  to  be  bit  0).  This  means  we  lose  all  integer  values  and  the  entire  product  that 
is  stored  is  only  the  fraction  portion  of  the  true  128  bit  product. 

To  represent  the  full  range  of  numbers  in  the  numeric  functions,  we  need 
to  retrieve  some  of  the  upper  bits.  The  segment  density  number  is  normally  a  whole 
number  (without  value  in  the  fraction);  occasionally  the  segment  density  number  may 
have  a  small  but  negligible  fraction.  We  can  perform  a  16  bit  logical  shift  right  to  the 
segment  density  number  without  a  large  loss.  This  opens  up  16  bits  in  the  integer  part  of 
the  product;  which  is  really  the  index  into  the  array  of  coefficients.  16  bits  is  enough  to 
represent  over  65,000  segments2.  The  product  is  then  shifted  48  bits  to  the  right  to  give 
an  index  number  (index  numbers  must  be  whole  numbers).  This  method  is  prone  to 
rounding  errors  which  occasionally  result  in  the  wrong  index. 

Other  schemes  have  to  be  implemented  when  both  operands  have  a 
significant  amount  of  data  in  the  fraction.  The  section  on  the  two’s  complement 
multiplier  discusses  other  schemes  in  more  detail. 

3.  Coefficients  Table 

The  coefficients  to  the  quadratic  equation  for  each  segment  are  stored  in  an  array 
in  the  OBM  banks  on  the  MAPR  board.  The  segment  index  encoder  provides  an  index 
into  the  array.  The  coefficients  are  accessed  and  applied  to  the  quadratic  equation  along 
with  the  x  value  that  is  being  evaluated. 

4.  Multiplier 

The  three  multipliers  shown  in  Figure  1  are  either  implemented  in  two’s 
complement  or  floating  point.  Floating  point  operations  increase  the  pipeline  depth,  but 
are  easier  to  code. 


2  The  largest  number  of  segments  is  34,483,  which  is  the  uniform  segmentation  of  ■yj—  ln(x)  , 
when  £  =  2  ’ ' .  Table  12  shows  the  number  of  segments  for  various  functions  when  using  uniform 
segmentation. 
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a.  Floating  Point  Multiplier 

The  floating  point  multipliers  implemented  in  the  NFG  are  implicitly 
instantiated.  The  operands  are  declared  as  doubles  and  when  the  multiplier  operator  in 
the  .me  file  was  applied,  the  MAP8  compiler  builds  the  floating  point  multiplier. 

b.  Two ’s  Complement  Fixed  Point  Multiplier 

The  three  main  categories  of  interest  are: 

•  Fixed  point  two’s  complement  multiplier 

•  Floating  point  multiplier 

•  Signed  Magnitude  multiplier 

The  signed  magnitude  multiplier  was  not  built.  The  fixed  point  multipliers 
implemented  in  the  NFG  are  either  implicitly  instantiated  or  explicitly  built  in  HDL.  The 
two’s  complement  fixed  point  multiplier  was  built  in  Verilog,  VHDL  and  implicitly 
instantiated  by  the  SRC  MAP®  with  various  levels  of  success. 

To  implicitly  instantiate  the  two’s  complement  multiplier,  the  operands  are 
declared  as  integer  values  ( int64_t )  and  when  the  multiplier  operator  in  the 
Subroutine. mc>  file  is  applied,  the  MAP  8  compiler  builds  the  appropriate  multiplier. 

This  method  has  two  major  problems;  (1)  The  SRC  64bitx64bit  multiplier 
does  not  result  in  a  128  bit  product.  Instead,  it  results  in  a  64  bit  product  that  is 
composed  of  only  the  lower  64  bits.  (2)  If  the  MSB  at  the  cutoff  is  a  binary  1,  the 
number  appears  as  a  negative  number,  even  though  it  is  really  a  positive  number. 

Because  of  the  number  system  chosen,  i.e.  32  bits  of  integer  and  32  bits  of 
fraction,  multiplication  results  in  a  product  that  represents  only  the  fraction  portion  of  the 
multiply;  the  integer  portion,  bits  65  through  128,  are  truncated. 

One  way  to  overcome  this  limitation  is  to  choose  a  different  number 
system  that  has  fewer  bits  to  represent  the  fraction,  but  this  reduces  the  accuracy  of  the 
NFG  and  it  still  limits  the  size  of  the  integer.  The  integer  must  be  at  least  16  bits  to 
provide  full  coverage  of  the  values  encountered  in  the  suite  of  functions  in  Table  1.  One 
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implementation  of  the  NFG  was  built  by  shifting  the  operands  right  8  bits,  before  the 
multiply.  This  allowed  for  16  bits  to  be  represented  in  the  integer  portion  of  the  product. 
In  this  case,  the  best  accuracy  that  one  would  expect  to  attain  is  24  bits,  i.e.  2  24 .  Due  to 
truncating  the  operands,  error  is  propagated  to  the  output  and  the  accuracy  is  not  reliable. 
Shifting  values  presents  another  problem,  because,  if  the  MSB  is  a  binary  1 ,  then  the  right 
shift  operation  will  sign  extends  the  number.  This  has  unwanted  effects.  A  product  may 
be  positive,  but  if  the  bit  right  before  the  cutoff  point  is  a  binary  1,  the  shifted  values  will 
be  sign  extended  and  we  have  to  zero  out  the  leading  bits.  More  detail  on  the  results  of 
this  method  can  be  found  in  section  V  where  the  implementation  results  are  covered. 

The  best  solution  is  to  build  an  HDL  multiplier  that  can  compute  the  result 
in  the  number  system  chosen  and  therefore  keep  the  desired  accuracy  and  the  best  range 
for  the  integer  without  any  sacrifices  to  accuracy.  The  problem  with  this  method  is  that  is 
requires  a  long  carry  chain. 

Verilog  or  VHDL  can  be  used  to  explicitly  build  the  multipliers.  Several 
multipliers  were  built  in  VHDL  and  Verilog.  The  HDL  files  do  not  meet  the  timing 
requirements  while  running  the  NFG,  although  the  program  compiles  without  any  errors. 
Simulation  using  Modelsim  and  Xilinx  ISE  showed  that  the  design  for  the  multipliers 
was  correct.  The  problem  appears  to  be  the  carry  chain  that  is  required  to  add  all  the 
partial  products. 

Further  investigation  is  needed  to  determine  if  indeed  the  problem  is  in  the 
carry  chain  and  if  a  carry  save  adder  (CSA)  followed  by  a  carry  lookahead  adder  (CLAH) 
are  required.  (Which  were  not  built) 

5.  Adder 

The  NFG  required  a  3-input  adder.  As  in  the  case  with  the  multipliers,  floating 
point  and  fixed  point  adders  are  instantiated  by  the  MAP R  compiler. 
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C.  SUMMARY 

The  NFG  circuit  requires  three  multipliers  and  one  3 -input  adder.  Floating  point 
implementation  is  easier  than  the  fixed  point  implementation,  but  requires  more 
hardware.  The  multipliers  can  be  instantiated  implicitly  or  in  the  case  of  fixed  point,  the 
user  has  the  option  to  explicitly  build  the  multiplier  in  HDL. 

Fixed  point  arithmetic  presents  some  challenges  with  rounding  and  truncating  of 
the  operands  and  results. 

The  circuit  design  was  built  on  the  SRC-6E  reconfigurable  hardware.  Section  IV 
provides  a  background  on  the  SRC-6E  system  to  give  a  better  understanding  of  the 
hardware  and  software  system. 
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IV.  SRC  BACKGROUND 


A.  INTRODUCTION 

The  late  Seymour  Cray  established  SRC  Computers  Incorporated  in  Colorado 
Springs,  Colorado  in  1996.  SRC  developed  the  IMPLICIT+EXPLICIT™  architecture 
that  is  designed  to  provide  increased  performance  over  conventional  processors  [16]. 

1.  IMPLICIT+EXPLICIT™  Architecture 

The  IMPLICIT+EXPLICIT™  architecture  allows  the  full  integration  of  Dense 
Logic  Device  (DLD)  technology  such  as  ASIC  devices  or  microprocessors  with 
rcconfigurable  Direct  Execution  Logic  (DEL).  SRC’s  Carte™  Programming 
Environment  lets  the  programmer  choose  that  part  of  code  that  executes  in  the  fixed  logic 
(i.e.  microprocessor  -  implicit)  and  that  part  that  executes  in  the  reconfigurable  hardware 
(explicit)  [16].  Figure  13  is  an  overview  of  the  SRC  IMPLICIT+EXPLICIT™ 
architecture. 


Fortran 


Carte™  Programming  Environment 
/  V 


Implicitly  Controlled  Device 

-  Dense  logic  device 

-  Higher  dock  rates 

-  Typically  fixed  logic 

-  mP.  DSP.  ASIC,  etc 


Memory 

Control 


C 


Explicitly  Controlled  Device 

-  Direct  execution  logic 

-  Lower  dock  rates 

-  Typically  reconfigurable 

-  FPGA,  CPLD,  OPLD,  etc 


Unified  Executable 


Figure  1 3 .  IMPLICIT+EXPLICIT™  architecture  [16]. 
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The  user  can  program  in  the  Carte™  Programming  Environment  in  C  or 
FORTRAN  instead  of  designing  logic.  A  single  executable  is  generated  that  specifies 
which  operations  execute  on  which  parts  of  the  system.  If  the  programmer  desires  to 
design  the  logic,  he/she  can  design  in  a  schematic  capture  program  and  generate  VHDL 
or  Verilog  files  that  are  used  as  macros.  The  user  can  also  code  the  Verilog  and  VHDL 
files  and  use  them  as  macros.  More  information  on  what  is  needed  to  implement  macros 
is  provided  in  the  section  on  software  code  [16]. 


B.  HARDWARE 

Figure  14  shows  3  Xilinx  XC2V6000  FPGAs  on  the  MAP®,  2  sets  of  memory  and 
some  ROM. 


1400  MB/s  1400  MB/s 
sustained  sustained 


Figure  14.  MAP®  Hardware  overview  diagram  [18]. 
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There  are  three  FPGAs.  The  user  can  program  two  of  the  FPGAs,  while  the  third 
is  used  as  a  controller.  The  FPGAs  are  Xilinx  Virtex  II’s,  XC2v6000  with  a  -4  speed 
grade.  There  are  6  banks  of  dual  ported  On-board  Memory  (OBM)  with  a  total  of  24  MB 
(high-speed  local  memory).  The  OBM  RAM  is  connected  to  the  two  user  logic  FPGAs 
via  a  4800  MB/s  (OBM  RAMs  is  also  connected  to  third  FPGA  via  another  4800  MB/s 
bus). 

The  two  FPGAs  are  connected  directly  to  each  other  with  access  to  a  4  MB  dual 
ported  memory  bank  for  inter-chip  data  exchange  on  a  4800  MB/s  bus.  The  two  FPGAs 
have  two  General  purpose  I/O  (GPIO)  ports  for  direct  data  off  the  MAP®1  that  is 
connected  via  a  2400  MB/s  bus. 

Internal  to  each  user  FPGA  is  an  additional  144  BRAM  18KB  blocks  [19]  for  a 
total  of  2,592  KBs  of  BRAM.  BRAM  is  fast  since  it  is  on  the  FPGA  chip. 


C.  SOFTWARE  CODE 

A  user  program  consists  of  two  C  programs,  main.c  and  <subroutine> .me  as  well 
as  “helper”  fdes. 

1.  main.c 

The  main  routine  is  a  C  program  that  runs  on  the  SRC’s  Intel  processor.  The 
main  routine  contains  the  declarations  for  the  subroutine  functions  and  makes  the 
subroutine  functions  visible  to  the  Intel  processor. 

To  effectively  use  the  MAP®  hardware,  we  need  to  partition  the  code  and  select 
the  portions  that  will  provide  improved  overall  performance  when  executed  on  the  MAP R 
processor.  These  include  loops  that  can  be  pipelined,  or  manipulation  of  bits  that  are  in  a 
long  bit  stream  of  data  [20].  They  are  placed  in  a  C  program  described  in  the  next 
section. 


51 


2.  <subroutine>.mc 

These  are  the  files  that  contain  the  function  subroutine  that  is  called  from  the  main 
routine  to  execute  on  the  MAP8  boards.  The  code  in  the  .me  files  should  not  contain  any 
external  calls  outside  the  MAP8  with  the  exception  of  SRC-defined  or  user-defined 
macros. 

The  .me  file  does  not  allow  any  system  calls  or  runtime  functions  that  require 
intervention  from  the  operating  system.  The  only  exception  is  the  printf  statement  which 
is  ignored  during  compile  time  except  in  debug  mode;  the  printf  statement  is  very  handy 
in  the  debug  mode.  This  means  that  .me  cannot  contain  any  additional  system  header 
files  besides  the  libmap.h  header  file,  which  is  the  only  runtime  library  allowed  in  the 
MAP®  [16]. 


3.  Makefile 

Many  files  are  used  during  compilation.  The  Makefile  identifies  the  files  and 
commands  that  are  used  by  the  compiler.  The  Makefile  allows  the  programmer  to  set  the 
source  code  preprocessing  environment  variables,  C  compiler  flags,  MAP8  compiler 
flags  and  simulation  compiler  flags  [16].  SRC  provides  a  template  that  can  be  tailored 
for  the  specific  needs  of  the  program. 

4.  Macros 

Macros  allow  the  programmer  to  design  in  HDL.  It  is  more  flexible  than  just  the 
<subroutine> .me  file  alone.  Macros  allow  the  programmer  the  flexibility  of  creating 
specific  and  unique  hardware  that  can  manipulate  wide  bit  values  and  all  the  way  down  to 
single  bits. 

To  implement  a  macro,  the  Makefile  needs  to  know  where  to  find  the  HDL  files 
and  the  macro  support  files.  The  following  are  required  for  macros: 
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a.  info 

The  info  file  provides  the  MAP®  compiler  with  the  name  of  the  macro  and 
the  relationship  between  the  call  and  the  macro  instantiation.  The  info  file  defines  the 
name,  characteristics  (such  as  whether  the  macro  is  pipelined),  whether  it  interacts  with 
external  systems  (outside  the  code  block),  the  latency  of  the  circuit  specified  by  the 
macro,  the  number  of  inputs  and  outputs.  The  signal  names  and  macros  in  the  Verilog 
code  that  is  generated  by  the  MAP'R'  compiler  requires  the  info  file  in  order  to  correctly 
map  the  operators  and  calls  in  the  source  program  [16]. 

The  info  file  can  also  be  used  to  define  the  behavior,  in  C,  that  the 
hardware  is  expected  to  perform.  This  feature  is  available  for  the  debug  mode  and  uses 
the  Intel  processor  to  emulate  the  hardware  that  the  programmer  intends  to  design  on  the 
MAP®. 

If  multiple  macros  are  used,  the  user  only  needs  one  info  file.  The 
information  associated  with  the  different  macros  must  be  put  into  the  one  info  file. 

b.  blk.v 

The  black  box  file,  blk.v,  describes  the  macros  interface.  It  is  a  simple  file 
that  tells  the  number  of  bits  for  each  input  and  output  and  is  described  in  a  Verilog-style. 

If  multiple  macros  are  used,  the  user  must  add  the  interface  information 
into  a  single  blk.v  file. 


c.  HDL  Files 

The  HDL  files  can  be  written  in  VHDL  or  Verilog.  They  are  specified  in 

the  Makefile. 


d.  Location  for  NGO  Directory 

This  location  must  be  specified  in  the  Makefile  to  identify  the  directory 
that  will  contain  all  the  NGO  files.  The  recommended  practice  is  to  put  the  NGO 
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directory  in  the  same  directory  with  all  the  macro  information,  and  include  the  info,  blk.v 
and  HDL  tiles. 

The  macros  describe  the  logical  design  at  a  high  level.  The  NGO  files  are 
used  by  NGDbuild  to  create  an  NGD  file.  The  NGD  file  describes  the  logical  design  in 
terms  of  Xilinx  primitives  (basic  elements  in  the  FPGA). 

D.  SUMMARY 

The  SRC  system  provides  flexibility,  and  a  user-friendly  interface  for  designing 
specialized  hardware. 

Various  implementations  of  the  NFG  were  built  on  the  SRC  system.  The  results 
are  documented  in  section  V. 
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V.  IMPLEMENTATION  RESULTS 


A.  UNIFORM  SEGMENTATION 

Uniform  segmentation  is  easier  to  implement  in  terms  of  programming  the 
<subroutine>.mc  file.  Appendix  C  shows  the  code  main.c  and  subr.mc  for  uniform  and 
non-unifonn  segmentation. 


1.  Floating  Point  Implementation 

Two  major  advantages  of  the  uniform  segmentation  floating  point  implementation 
are  (1)  the  multiplier  does  all  the  work  of  moving  the  decimal  point  and  (2)  once  the  file 
is  compiled,  any  function  can  be  computed  without  having  to  recompile.  The  only 
requirement  is  to  change  the  memory  file. 

The  disadvantage  is  that  floating  point  operations  require  much  hardware.  The 
complexity  of  using  floating  point  is  hidden  from  the  user,  but  is  evident  in  the  amount  of 
multipliers  consumed  and  the  pipeline  depth  required.  Figure  15  shows  the  summary 
report  after  the  compile  process  is  completed;  (i.e.  after  the  user  types  make  hw  ). 


###################################################################### 
##################  INNER  LOOP  SUMMARY  #################### 

loop  on  line  55: 

clocks  per  iteration:  1 

pipeline  depth:  84 

###################################################################### 


###############  PLACE  AND 

ROUTE 

SUMMARY 

#################### 

Number 

of 

Slice  Flip  Flops: 

17, 

647 

out 

of 

67,584 

26% 

Number 

of 

4  input  LUTs: 

9, 

299 

out 

of 

67,584 

13% 

Number 

of 

occupied  Slices: 

11, 

390 

out 

of 

33,792 

33% 

Number 

of 

MULTI  8X1 8s : 

64 

out 

of 

144 

44% 

freq  =  100.2  MHz 

###################################################################### 


Figure  15.  NFG  Pipeline  depth  and  place  and  route  summary. 
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The  SRC  has  user  callable  macros  that  are  summarized  in  Appendix  A  of  [20]. 
Figure  16  shows  the  difference  between  the  pipeline  depth  of  the  NFG  and  the  SRC  user 
callable  macro.  The  pipeline  depth  for  the  NFG  is  a  20%  less  than  that  of  the  user 
callable  macro. 

Figure  16  also  shows  the  place  and  route  information  associated  with  mapping 
both  the  NFG  and  SRC’s  user  callable  cosine  macro.  Comparing  Figure  15  with  Figure 
16,  one  can  see  the  hardware  requirements  have  increased  due  to  adding  SRC’s  user 
callable  macro. 


###################################################################### 
##################  INNER  LOOP  SUMMARY  #################### 

loop  on  line  55: 

clocks  per  iteration:  1 

pipeline  depth:  84 

loop  on  line  72 : 

clocks  per  iteration:  1 

pipeline  depth:  105 

###################################################################### 
###############  PLACE  AND  ROUTE  SUMMARY  #################### 


Number 

of 

Slice  Flip  Flops: 

27,557 

out 

of 

67,584 

40 

Number 

of 

4  input  LUTs: 

17,318 

out 

of 

67,584 

25 

Number 

of 

occupied  Slices: 

17,862 

out 

of 

33,792 

52 

Number 

of 

Block  RAMs : 

1 

out 

of 

144 

1 

Number 

of 

MULTI  8X1 8s : 

92 

out 

of 

144 

63 

freq  =  100.0  MHz 

###################################################################### 

Figure  16.  Pipeline  depth  (NFG  and  SRC  Cosine  Macro).  Place  and  route  summary. 


Table  1 1  shows  a  comparison  of  the  hardware  used  to  build  the  NFG,  the  macro 
and  both  on  the  same  FPGA.  The  comparison  shows  that  the  NFG  approximation  is 
close  to  the  macro  in  terms  of  hardware  needed;  with  the  exception  of  the  multiplier.  The 
NFG  requires  a  slightly  more  than  double  the  multipliers  that  the  macro  requires. 
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NFG  Alone 

Macro  Alone 

NFG  &  Macro 

#  of  Slice  Flip  Flops 

26% 

21% 

40% 

#  of  4  input  LUTs 

13% 

14% 

25% 

#  of  occupied  Slices 

33% 

27% 

52% 

#  of  Block  RAMs 

0% 

1% 

1% 

#  of  MULT  18X1 8s 

44% 

19% 

63% 

Freq 

100.2  MHz 

100.1  MHz 

100.0  MHz 

Table  11.  Comparison  of  NFG  uniform  segmentation  and  macros:  NFG  alone,  Macro  alone 
and  both  (function  is  cos(/z\x) .  Implementations  without  offset. 


The  implementation  described  above  applies  to  functions  that  have  a  domain 
interval  that  starts  at  zero.  If  the  interval  starts  at  a  non-zero  value,  then  the  index 
computed  needs  to  be  adjusted  by  an  offset  value.  Figure  17  shows  the  hardware 
requirements  when  the  offset  is  applied. 


###################################################################### 
##################  INNER  LOOP  SUMMARY  #################### 

loop  on  line  56: 

clocks  per  iteration:  1 

pipeline  depth:  98 

loop  on  line  7 4 : 

clocks  per  iteration:  1 

pipeline  depth:  127 

###################################################################### 
###############  PLACE  AND  ROUTE  SUMMARY  #################### 


Number 

of 

Slice  Flip  Flops: 

29, 306 

out 

of 

67,584 

43 

Number 

of 

4  input  LUTs: 

20, 678 

out 

of 

67,584 

30 

Number 

of 

occupied  Slices: 

20, 125 

out 

of 

33,792 

59 

Number 

of 

Block  RAMs: 

1 

out 

of 

144 

1 

Number 

of 

MULTI  8X1 8s : 

72 

out 

of 

144 

50 

freq  =  100.0  MHz 

###################################################################### 

Figure  17.  Pipeline  depth  (NFG  and  SRC  ^/-ln(jc)  implemented  in  macros).  Place  and 

route  summary  with  subtraction  hardware  included  for  computing  offset  (when 
finding  the  index,  of  coefficients). 


The  adjustment  is  a  subtraction  operation.  In  the  floating  point  number  system, 


the  hardware  required  to  perform  arithmetic  computations  is  large  and  by  adding  a 
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subtraction  computation,  the  NFG  pipeline  depth  increases  from  84,  as  shown  in  Figure 
15  and  Figure  16,  to  98  as  shown  in  Figure  17. 

Figure  18  shows  the  comparison  between  the  output  of  the  macro  and  the  NFG. 
The  macro  computes  using  float  values,  while  the  NFG  can  compute  higher  precision 
values.  Therefore,  a  user  can  achieve  a  shorter  pipeline  depth  and  higher  accuracy  by 
using  the  NFG.  The  cost  of  using  the  NFG  is  that  the  user  must  have  a  memory  file  to 
load  the  coefficients  of  the  quadratic  approximation  into  OBM. 

Figure  18  shows  the  comparison  of  the  results  from  the  NFG  that  uses  a  memory 
file  with  the  coefficients  computed  with  an  accuracy  of£  =  2  .  This  implementation  has 

459  segments  and  an  accuracy  of  32  bits. 

The  first  labeled  column  in  Figure  18  is,  x  values,  which  shows  the  values  of  x, 
which  in  this  case  are  the  endpoints.  Based  on  the  Remez  algorithm,  the  end  points, 
begin  points  and  two  other  points  in  the  middle  of  each  segment  have  the  worst  case 
approximation  error.  Therefore,  we  expect  to  see  the  error  of  these  points  to  be  very 
close  to  the  maximum  error  allowed  for  the  segmentation 
i.e. s  =  2-33  =  1.1641532...  xlO10  (essentially  at  the  10th  decimal  place). 

Excel  and  MATLAB  are  used  to  compute  cos(/rx) .  The  results  for  Excel  and 
MATLAB  are  exactly  the  same  as  shown  in  Figure  18,  in  the  column  labeled  Excel- 
MATLAB  (difference  of  the  results  is  zero).  The  NFG  output  and  the  SRC  cosine  macro 
are  compared  to  Excel  and  the  results  are  shown  in  the  last  two  columns.  Figure  18 
shows  that  SRC’s  macro  is  accurate  to  8  =  2  23 ,  which  is  the  correct  accuracy  for  floating 
point  values.  The  NFG  is  accurate  to  within  2  33 .  This  accuracy  can  be  increased  without 
an  increase  in  FPGA  hardware,  if  desired.  The  cost  is  OBM  memory  to  store  a  larger 
coefficients  table. 
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x  Values 

x:  0.00089400089400090 
x:  0.00178850178850180 
x:  0.00268300268300270 
x:  0.00357750357750360 
x:  0.00447200447200450 
x:  0.00536600536600540 
x:  0.00626050626050630 
x:  0.00715500715500720 
x:  0.00804950804950800 
x:  0.00894400894400890 
x:  0.00983850983850980 
x:  0.01073301073301070 
x:  0.01162751162751160 
x:  0.01252201252201250 
x:  0.01341651341651340 
x:  0.01431101431101430 
x:  0.01520501520501520 
x:  0.01609951609951610 
x:  0.01699401699401700 
x:  0.01788851788851790 
x:  0.01878301878301880 
x:  0.01967751967751970 
x:  0.02057202057202060 
x:  0.02146652146652150 
x:  0.02236102236102240 
x:  0.02325552325552330 
x:  0.02415002415002410 
x:  0.02504402504402500 
x:  0.02593852593852590 
x:  0.02683302683302680 


NFG  OUTPUT 

ysubr:  0.999996055923 
ysubr:  0.999984214899 
ysubr:  0.999964477019 
ysubr:  0.999936842441 
ysubr:  0.999901311383 
ysubr:  0.999857910602 
ysubr:  0.999806591898 
ysubr:  0.999747377743 
ysubr:  0.999680268602 
ysubr:  0.999605265006 
ysubr:  0.999522367549 
ysubr:  0.999431576883 
ysubr:  0.999332893727 
ysubr:  0.999226318859 
ysubr:  0.999111853121 
ysubr:  0.998989497418 
ysubr:  0.998859327721 
ysubr:  0.998721199455 
ysubr:  0.998575184308 
ysubr:  0.998421283434 
ysubr:  0.998259498047 
ysubr:  0.998089829425 
ysubr:  0.997912278907 
ysubr:  0.997726847897 
ysubr:  0.997533537859 
ysubr:  0.997332350318 
ysubr:  0.997123286864 
ysubr:  0.996906472609 
ysubr:  0.996681666744 
ysubr:  0.996448990104 


ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 

ySRCMacro 


SRC  MACRO 
0.999996066093 
0.999984204769 
0.999964475632 
0.999936819077 
0.999901294708 
0.999857902527 
0.999806582928 
0.999747395515 
0.999680280685 
0.999605238438 
0.999522387981 
0.999431550503 
0.999332904816 
0.999226331711 
0.999111831188 
0.998989522457 
0.998859345913 
0.998721182346 
0.998575210571 
0.998421311378 
0.998259484768 
0.998089849949 
0.997912287712 
0.997726857662 
0.997533559799 
0.997332334518 
0.997123301029 
0.996906459332 
0.996681690216 
0.996448993683 


Excel  Cosine 

0.999996055923 

0.999984214899 

0.999964477020 

0.999936842442 

0.999901311383 

0.999857910603 

0.999806591900 

0.999747377745 

0.999680268604 

0.999605265009 

0.999522367552 

0.999431576887 

0.999332893731 

0.999226318863 

0.999111853126 

0.998989497423 

0.998859327726 

0.998721199461 

0.998575184314 

0.998421283440 

0.998259498053 

0.998089829432 

0.997912278915 

0.997726847905 

0.997533537867 

0.997332350326 

0.997123286873 

0.996906472618 

0.996681666753 

0.996448990113 


MATLAB 

0.999996055923 

0.999984214899 

0.999964477020 

0.999936842442 

0.999901311383 

0.999857910603 

0.999806591900 

0.999747377745 

0.999680268604 

0.999605265009 

0.999522367552 

0.999431576887 

0.999332893731 

0.999226318863 

0.999111853126 

0.998989497423 

0.998859327726 

0.998721199461 

0.998575184314 

0.998421283440 

0.998259498053 

0.998089829432 

0.997912278915 

0.997726847905 

0.997533537867 

0.997332350326 

0.997123286873 

0.996906472618 

0.996681666753 

0.996448990113 


Excel-MATLAB 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 

0.000000000000000000 


ySRCMacro  -  Excel 
0.00000001017031 
-0.00000001012988 
-0.00000000138820 
-0.00000002336516 
-0.00000001667436 
-0.00000000807656 
-0.00000000897243 
0.00000001777088 
0.00000001208112 
-0.00000002657168 
0.00000002042948 
-0.00000002638399 
0.00000001108489 
0.00000001284753 
-0.00000002193770 
0.00000002503456 
0.00000001818690 
-0.00000001711431 
0.00000002625699 
0.00000002793842 
-0.00000001328536 
0.00000002051732 
0.00000000879730 
0.00000000975711 
0.00000002193241 
-0.00000001580801 
0.00000001415636 
-0.00000001328664 
0.00000002346290 
0.00000000356950 


NFG-Excel 

-0.00000000000016 

-0.00000000000049 

-0.00000000000081 

-0.00000000000114 

-0.00000000000001 

-0.00000000000178 

-0.00000000000211 

-0.00000000000195 

-0.00000000000276 

-0.00000000000307 

-0.00000000000339 

-0.00000000000372 

-0.00000000000406 

-0.00000000000436 

-0.00000000000469 

-0.00000000000501 

-0.00000000000535 

-0.00000000000568 

-0.00000000000601 

-0.00000000000633 

-0.00000000000665 

-0.00000000000698 

-0.00000000000730 

-0.00000000000763 

-0.00000000000795 

-0.00000000000828 

-0.00000000000860 

-0.00000000000891 

-0.00000000000925 

-0.00000000000957 


C\ 
i n 


Float  Accuracy  0.00000011920929 

32  Bit  Accuracy  0.00000000011642 


Figure  18. 


Results  from  Uniform  Segmentation  NFG  compared  with  SRC  Cosine  Macro,  MATLAB  and  Excel. 


The  results  in  Figure  18  show  that  the  accuracy  in  the  NFG  can  be  increased  to  33 
bits.  To  take  advantage  of  the  uniform  segmentation,  we  need  to  know  the  number  of 
segments  required  in  uniform  segmentation.  The  quadratic  coefficients  for  the  numeric 
functions  are  stored  in  OBM  memory.  Table  12  shows  the  number  of  segments  required 
for  each  of  the  accuracies.  All  the  segments  shown  can  be  implemented  in  the  NFG,  even 
when  the  number  of  segments  is  as  large  as  34483;  as  in  the  numeric  function:  yj—  ln(x) 


Numeric  Function 

Number  of  Segments 

s  =  2~17 

e  =  2~24 

e  =  2~33 

2X 

8 

39 

311 

1/ X 

17 

81 

646 

Vx 

7 

33 

257 

1/ Vx 

11 

55 

439 

log2(x) 

13 

64 

506 

ln(x) 

12 

56 

448 

sin(;rx) 

14 

70 

559 

cos(;rx) 

14 

70 

559 

tan(7rx) 

18 

88 

704 

yj~  ln(x) 

794 

4017 

34483 

tan2(;rx)  +  l 

30 

151 

1204 

—  (xlog2  x  +  (1  —  x) log2(l-x)) 

399 

2013 

16667 

1 

l  +  e-x 

5 

23 

178 

1 

yf2ft 

11 

52 

412 

sin(ex) 

125 

627 

5103 

Table  12.  Number  of  segments  required  for  Uniform  Segmentation  computed  with 

N=  1,000, 000  for  various  values  of<?  . 


2.  Fixed  Point  Implementation 

The  fixed  point  implementation  has  a  shorter  pipeline  depth.  Numeric  function 
2X  has  a  pipeline  depth  of  31  in  fixed  point  and  84  in  floating  point  uniform 
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segmentation.  The  multiplier  inferred  by  the  SRC  accepts  64  bit  operands  and  outputs  a 
64  bit  product  that  contains  only  the  lower  64  bits  of  the  computed  128  bit  product.  This 
present  a  challenge  when  computing  in  fixed  point  number  system  as  discussed  in  section 
III.BAa. 

Table  13  shows  the  fixed  point  implementation  without  any  special  adjustments  to 
the  bits.  The  function  is  2X .  The  green  portion  of  the  table  did  not  require  any 
adjustment.  In  the  yellow  section,  adjustments  are  required  to  eliminate  the  unintended 
sign  extension  of  shifted  values.  The  last  two  columns  show  the  accuracy  of  the  NFG. 
The  very  last  column  shows  the  accuracy  when  rounding  is  performed  (rounding 
performed  only  in  the  final  result,  not  at  any  intermediate  points). 


Index 

x  Values 

Excel  2*x 

NFG  Aonrox 

Accurate 

to  x  Bits 

If  rounding 

were 

performed 

0 

6905840 

104972342 

1049722c3 

23  bits 

24 

bits 

1 

d20cl46 

1094364e6 

109436464 

24  bits 

24 

bits 

2 

13bl2a4d 

10e051a07 

10e051983 

22  bits 

24 

bits 

3 

la419353 

112dca51f 

112dca498 

23  bits 

24 

bits 

4 

20dlf c5a 

117ca6a6a 

117ca69e0 

22  bits 

24 

bits 

5 

27626560 

llccecf f2 

llccecf 66 

24  bits 

24 

bits 

6 

2df 2ce67 

121ea3d94 

12 Iea3d0  6 

24  bits 

24 

bits 

7 

34  8337  6e 

1271dld0a 

1271dlc7b 

23  bits 

23 

bits 

8 

3bl3a074 

12c67d9f 5 

12c67d962 

24  bits 

24 

a4 la4 la4 

18f374ald 

18f374959 

22  bits 

23 

bits 

25 

aaaaaaab 

1965fea54 

1 9  65f ebld 

23  bits 

23 

bits 

26 

bl3bl3bl 

19da96753 

1 9da9  668  9 

22  bits 

24 

bits 

27 

b7cb7cb8 

Ia51457f7 

Ia51458c6 

20  bits 

21 

bits 

28 

be5be5be 

lacal55ce 

lacal54fc 

23  bits 

24 

bits 

36 

f 2df 2df 3 

leelebf39 

leelec02c 

21  bits 

21 

bits 

37 

f 96f 96f 9 

If 6fb0  940 

If6fb084a 

23  bits 

23 

bits 

38 

100000000 

200000000 

0 

N/A 

N/A 

Table  13.  Fixed  point  implementation  of 2* ,  no  bit  shifts,  N=  1,000, 000  and.?  =  2  24 . 
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Table  13  shows  that  the  accuracy3  degrades  in  segments  of  higher  index.  This  is 
expected  because  uniform  segmentation  results  in  segments  that  have  varied  accuracy. 
Figure  19  shows  the  error  expected  for  uniform  segmentation  of  2X ,  which  is  consistent 
with  the  results  in  Table  13.  When  implemented  in  hardware,  this  design  does  not  meet 
the  accuracy  because  the  values  are  truncated  at  various  intermediate  points  in  the 
computation.  The  error  propagates  and  magnifies  the  error  in  the  result. 

A  bigger  problem  exists  in  indexing.  In  Table  13,  the  coefficients  used  to 
compute  the  NFG  output  for  index  24,  were  actually  coefficients  intended  for  segment 
25.  The  segment  indexing  failed  to  give  the  correct  index.  These  problems  contributed 
to  the  lower  output  accuracy  as  is  seen  in  the  second  from  last  column  in  Table  13. 

The  advantage  of  using  2A  is  that  all  values  are  less  than  1.0  except  for  the  last 
value;  x  is  1.0.  No  integers  to  deal  with  in  this  example. 


x  io"8  Error  for  UNIFORM  f(x)=2x  segmentation.  No.  of  segs  =  39. 


Figure  19.  Uniform  Segmentation  of2v ,  N=1,000,000  and£  =  2  24 . 


3  The  endpoints  of  the  segments  are  used  as  the  x  input  values  to  test  the  numeric  factions.  The 
endpoints  have  the  worst  case  approximation  error.  Table  13  shows  the  worst  case  scenario. 


62 


This  implementation  works  for  only  a  few  functions.  To  make  it  work  for  the  rest 
of  the  functions,  a  better  method  is  required  to  handle  integers  and  rounding. 

Table  14  shows  the  implementation  adjusted  to  accommodate  the  integers.  As 
described  in  III.B.4.b,  an  arithmetic  shift  right  (8  bits)  is  performed  on  the  multiplication 
operands  before  multiplication.  The  product  now  has  16  bits  to  represent  integer  portion 
of  the  product.  This  is  enough  for  all  the  values  that  will  be  encountered  in  the  suite  of 
functions  investigated. 

The  worst  case  function  is  yj-  ln(x)  of  large  coefficients.  Whenever  the 
coefficients  are  very  large,  the  impact  of  small  numbers  is  larger  and  therefore  a  greater 
room  for  errors  exists.  When  the  operands  are  shifted,  the  values  are  truncated  which 
causes  propagation  of  error  to  the  product.  Last  column  shows  the  accuracy. 


INDEX 

X 

xA2 

a 

axA2 

b 

bx 

c 

fx 

Accuracy 

1959 

If7bb7 

3df32 

8b2b821 

21ad89af 

ffffffff fb0d5a05 

ffffffff 6439c766 

Iecb291c2 

17299e2d7 

25  bits 

1959 

If 7bb77c 

3df 323a 

8b2b82181 

21ad8ba8 

fffffffb0d5a05al 

ffffffff 6439c516 

Iecb291c2 

17299e284 

1960 

If7fc3 

3e030 

8b09553 

21ade3bd 

fffffffffb0de083 

ffffffff 64364dd0 

Iecaa4c9b 

1728e7e28 

16  bits 

1960 

If7fc3b5 

3e0312a 

8b09553aa 

21adedda 

fffffffb0de083fd 

ffffffff 64364a70 

Iecaa4c9b 

1728e84e6 

1961 

If 83cf 

3el2f 

8ae7350 

21ae4550 

ffffffff fb0e66el 

ffffffff 6432d489 

leca20868 

172832241 

20  bits 

1961 

If 83cfee 

3el303a 

8ae7350d8 

21ae4ffb 

ffffff fb0e66elad 

ffffffff 6432d005 

leca20868 

172832868 

1962 

If 87dc 

3e22f 

8ac5215 

21aeae58 

ffffffff fbOeedlf 

ffffffff 642f56a0 

Iec99c51b 

17277cal3 

21  bits 

1962 

If 87dc27 

3e22f 6b 

8ac5215bd 

21aeb200 

fffffffb0eedlf75 

ffffffff 642f55ec 

Iec99c51b 

17277cd06 

1963 

If 8be8 

3e32e 

8aa31a7 

21af0d91 

ffffffff fb0f733c 

ffffffff 642bddd6 

Iec9182c8 

1726c6e2f 

19  bits 

1963 

If 8be861 

3e32ebe 

8aa31a702 

21afl3fc 

fffffffb0f733c23 

ffffffff 642bdbfd 

Iec9182c8 

1726c72cl 

1964 

lf8ff 4 

3e42e 

8a811fc 

21af 742b 

ffffffff fbOff 939 

ffffffff 64286557 

lec894153 

172611ad5 

22  bits 

1964 

If 8ff 49a 

3e42e30 

8a811fcf0 

21af75d3 

ffffff fbOff 93990 

ffffffff 64286272 

lec894153 

172611997 

1965 

If 9400 

3e52d 

8a5f3if 

21afdl03 

ffffffff fbl07fl5 

ffffffff 6424ed03 

lec8100da 

17255bee0 

17  bits 

1965 

If 9400d3 

3e52dc4 

8a5f31f3f 

21afd7a4 

fffffffbl07fl5ca 

ffffffff 6424e90a 

lec8100da 

17255cl89 

1966 

If 980d 

3e62d 

8a3d508 

21b03549 

ffffffff fbll04d2 

ffffffff 64217027 

Iec78cl4c 

1724a66bc 

23  bits 

1966 

If 980d0c 

3e62d78 

8a3d508dl 

21b0395e 

fffffffbll04d205 

ffffffff 64216fec 

Iec78cl4c 

1724a6a96 

1967 

If 9cl9 

3e72d 

8alb7b9 

21b09860 

ffffffff fbll8a6e 

ffffffff 641df863 

Iec7082a9 

1723f 136c 

21  bits 

1967 

If 9cl945 

3e72d4e 

8alb7b98d 

21b09b00 

fffffffbll8a6e3d 

ffffffff 641df714 

Iec7082a9 

1723fl4be 

1968 

lfa025 

3e82d 

89f9b37 

21b0fa5d 

ffffffff fbl20fe8 

ffffffff 641a80a5 

lec684509 

17233c00b 

28  bits 

1968 

lfa0257f 

3e82d44 

89f 9b3736 

21b0fca5 

fffffffbl20fe8f 7 

ffffffff 641a7e53 

lec684509 

17233c001 

1969 

lfa431 

3e92d 

89d7f75 

21bl5b0f 

ffffffff fbl29545 

ffffffff 64170989 

lec60083c 

172286cd4 

24  bits 

1969 

Ifa431b8 

3e92d5a 

89d7f754d 

21bl5elb 

ffffff fbl295453d 

ffffffff 64170607 

lec60083c 

172286c5e 

1970 

lfa83d 

3ea2d 

89b647f 

21blbaa9 

ffffffff fbl31a80 

ffffffff 64139271 

Iec57cc71 

1721dl98b 

25  bits 

1970 

lfa83dfl 

3ea2d~92 

89b647f6b 

21blbf 92 

ffffff fbl31a8026 

ffffffff 64138dd2 

Iec57cc71 

1721dl9d6 

Table  14.  Fixed  point,  uniform  segmentation  of  yj-  ln(x)  ,  multiplier  operands  shifted  by  8 

bits,  N=  1,000, 000  and.?  =  2  24 . 
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In  Table  14,  the  first  column  is  the  index  into  the  array.  The  rows  show  two 
computations;  the  NFG  is  in  the  colored  row  and  the  row  below  shows  the  correct  values 
which  have  been  computed  in  MATLAB  and  converted  to  the  number  representation. 
Coefficients  a,  b  the  input  x  and  xA2  are  shifted  8  bits  in  the  NFG  (colored  rows).  The 
intennediate  products  show  the  error  in  the  intennediate  steps.  The  two  products;  axA2 
and  bx  have  been  realigned  before  the  final  addition  step.  Table  14  shows  the  effect  of 
the  error  as  it  propagates  from  the  intermediate  steps  to  the  final  answer.  The  last  column 
shows  the  number  of  bits  that  match  between  the  NFG  output  (in  the  colored  row)  and  the 
desired  output.  This  is  basically  telling  how  accurate  the  NFG  has  performed.  As  can  be 
seen,  there  are  instances  where  the  error  is  large. 

Table  15  shows  the  pipeline  depth  is  32.  It  also  shows  the  summary  of  place  and 
route  and  hardware  resource  requirements  to  implement  uniform  segmentation  using 
fixed  point  numbers.  This  data  is  the  same  for  all  the  numeric  functions.  The  memory 
file  determines  which  numeric  function  will  be  implemented. 


###################################################################### 
##################  INNER  LOOP  SUMMARY  #################### 

loop  on  line  54 : 

clocks  per  iteration:  1 

pipeline  depth:  32 

###################################################################### 
###############  PLACE  AND  ROUTE  SUMMARY  #################### 


Number 

of 

Slice  Flip  Flops: 

8,751 

out 

of 

67,584 

12 

Number 

of 

4  input  LUTs: 

3,282 

out 

of 

67,584 

4 

Number 

of 

occupied  Slices: 

5,226 

out 

of 

33,792 

15 

Number 

of 

MULTI  8X1 8s : 

40 

out 

of 

144 

27 

freq  =  100.0  MHz 

###################################################################### 


Table  15.  Pipeline  depth  and  hardware  resources  for  uniform  implementation  with  no 

adjustments. 


Table  16  is  a  comparison  of  unifonn  segmentation  between  the  floating  point  and 
fixed  point  NFG  implementations.  They  both  require  the  same  size  memory  files,  but  the 
floating  point  hardware  can  handle  a  larger  range  of  values  than  the  fixed  point 
implementation. 
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Floating  Point 

Fixed  Point 

Fixed  Point  / 
Floating  Point 

Pipeline  Depth 

84 

32 

38% 

#  of  Slice  Flip  Flops 

26% 

12% 

46% 

#  of  4  input  LUTs 

13% 

4% 

31  % 

#  of  occupied  Slices 

33% 

15% 

45  % 

#  of  Block  RAMs 

0% 

0% 

0% 

#  of  MULT  18X1 8s 

44% 

27% 

61  % 

Freq 

100.2  MHz 

100.1  MHz 

0% 

Table  16.  Comparison  of  uniform  segmentation  NFG  between  fixed  point  and  floating 

point. 


B.  NON-UNIFORM  SEGMENTATION 

Non-uniform  segmentation  requires  a  segment  index  encoder.  The  SRC 
programming  environment  has  a  priority  selector  macro  that  is  used  as  the  segment  index 
encoder  for  the  NFG. 


1.  Floating  Point  Implementation 

The  priority  selector  macro  in  the  SRC,  is  used  as  the  segment  index  encoder. 
The  priority  selector  has  a  limit  (approximately  150  elements)  when  used  in  the  NFG 
with  three  64  bit  multipliers.  The  non-uniform  segmentation  NFG,  in  floating  point,  has 
a  pipeline  depth  of  74. 

The  math  macros  available  in  the  SRC  have  pipeline  depths  that  vary.  For 

1  -- 

example,  , —  e  2  implemented  using  the  math  macros  has  a  pipeline  depth  of  274  as 
V  2/r 

shown  in  Table  17.  Table  17  summarizes  the  hardware  pipeline  depth  for  the  suite  of 
numeric  functions.  The  table  shows  side  by  side  comparisons  of  the  pipeline  depth  for 
the  NFG  and  the  SRC  math  macros.  In  10  of  the  15  functions,  the  pipeline  depth  is 
smaller.  For  one  function  the  pipeline  depths  are  the  same  and  for  4  of  the  functions  the 
NFG  pipeline  depth  is  larger.  Regardless  of  the  size  of  the  function,  the  NFG  has  the 
same  pipeline  depth;  the  only  exception  issin(C) .  It  is  only  one  clock  longer. 
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Three  functions  in  Table  17  are  limited  by  the  number  of  segments  required.  In 
the  floating  point  implementation  with  3  multipliers  and  the  other  hardware  requirements, 
the  FPGA  runs  out  of  resources  to  build  large  priority  selectors.  The  priority  selectors 
were  limited  to  approximately  150  segments.  Implementations  requiring  larger  selectors 
did  not  compile  on  the  MAP.  The  data  was  obtained  by  compiling  in  debug  mode.  Some 


of  the  implementations  were  built  in  hardware,  for  example: 


1  -4 


VZtt 


Numeric  Function 

Macro 

Pipeline  Depth 

NFG 

Pipeline  Depth 

Number  of 
Segments 
^  =  2~24 

2X 

132 

74 

35 

1/ X 

70 

74 

50 

Vx 

43 

74 

24 

1/ yfx 

74 

74 

36 

log2(x) 

73 

74 

44 

ln(x) 

61 

74 

39 

sin(;rx) 

105 

74 

58 

cos(7rx) 

105 

74 

58 

tan(/rx) 

135 

74 

58 

V-,n(x) 

127 

74 

1634 

tan2  {nx)  + 1 

254 

74 

79 

— (xlog2  x  +  (1  —  x)  log0(l-x)) 

114 

74 

183* 

1 

l  +  e~x 

185 

74 

20 

1 

274 

74 

45 

sin(ev) 

212 

75 

265* 

Table  17.  Pipeline  depth  for  various  implementations  of  using  the  available  macros  or  the 

NFG  in  floating  point  number  system. 


4  Note  that  these  numbers  (number  of  segments)  are  larger  than  150,  and  cannot  be  realized  in  priority 
selector  in  the  floating  point  implementation. 
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When  both  the  NFG  and  the  macro  are  built  on  the  FPGA,  a  large  amount  of 
resources  are  consumed  and  the  frequency  may  be  affected  due  to  place  and  route 
difficulties  and  increased  delay  in  the  wiring.  Figure  20  shows  the  summary  of  the  place 

1  ~ 

and  route  when  numeric  function  . —  e  2  is  implemented  with  the  macros  and  the  NFG 

V2;r 

both  on  the  same  FPGA.  The  frequency  is  77.2MHz. 


###################################################################### 

##################  INNER  LOOP  SUMMARY  #################### 

loop  on  line  53: 

clocks  per  iteration:  1 

pipeline  depth:  74 

loop  on  line  139: 

clocks  per  iteration:  1 

pipeline  depth:  274 

###################################################################### 

###############  PLACE  AND  ROUTE  SUMMARY  #################### 

Number  of  Slice  Flip  Flops:  51,967  out  of  67,584  76% 

Number  of  4  input  LUTs:  39,520  out  of  67,584  58% 

Number  of  occupied  Slices:  33,790  out  of  33,792  99% 

Number  of  Block  RAMs :  3  out  of  144  2% 

Number  of  MULT18X18s:  90  out  of  144  62% 

freq  =  77.2  MHz 

###################################################################### 

1  -v 

Figure  20.  NFG  and  macro  both  built  on  the  FPGA  for  numeric  function;  . —  e  2  . 

\I2tv 
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The  performance  improves  if  only  one  is  built  at  a  time.  Figure  21  shows  the 
same  function  built  on  the  FPGA  using  the  NFG  only.  The  frequency  is  100.0MHz. 


###################################################################### 
##################  INNER  LOOP  SUMMARY  #################### 

loop  on  line  53: 

clocks  per  iteration:  1 

pipeline  depth:  74 

###################################################################### 


###############  PLACE  AND 

ROUTE 

SUMMARY 

#################### 

Number 

of 

Slice  Flip  Flops: 

26, 

377 

out 

of 

67,584 

39% 

Number 

of 

4  input  LUTs: 

16, 

386 

out 

of 

67,584 

24% 

Number 

of 

occupied  Slices: 

17, 

473 

out 

of 

33,792 

51% 

Number 

of 

MULTI  8X1 8s : 

48 

out 

of 

144 

33% 

freq  =  100.0  MHz 

###################################################################### 


Figure  2 1 .  NFG  built  on  the  FPGA  for  numeric  function;  . —  e  2  . 

V2;r 


Table  18  shows  the  results  from  computing  , —  e  2  ,  with  N=l, 000,000 

V2;r 

and.?  =  2  24.  The  values  are  displayed  to  twelve  decimal  places.  This  function  requires 
45  segments.  The  values  of  x  that  are  tested  in  Table  18  are  the  endpoints  of  the  segment 
and  therefore  have  the  worst  case5  approximation  error.  At  the  very  bottom  of  Table  18 
is  s  =  2  24  in  decimal.  The  last  column  shows  the  approximation  error  is  consistently 
smaller  than^  ;  per  the  design. 


5  If  the  x  input  to  the  NFG  were  somewhere  in  the  middle  of  the  segment,  the  approximation  error 
would  be  smaller.  There  are  four  points  in  a  segment  with  worst  case  approximation  error.  Figure  10  is  a 
good  example  to  see  the  distribution  of  the  approximation  error  on  a  non-uniform  segment. 
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194 

396 

clocks 

clocks 

NFG 

SRC  OUTPUT 

Excel 

SRC-Excel 

NFG-Excel 

x: 

0.065896761049 

0.398076980336 

0.398077040911 

0.398077039931 

0.000000000979 

-0.000000059595 

x: 

0.113411555833 

0.396384819146 

0.396384894848 

0.396384878748 

0.000000016100 

-0.000000059601 

x: 

0.155068672183 

0.394174398848 

0.394174486399 

0.394174458446 

0.000000027952 

-0.000000059598 

x: 

0.193392483833 

0.391551192645 

0.391551256180 

0.391551252249 

0.000000003931 

-0.000000059604 

x: 

0.229466279456 

0.388576167409 

0.388576239347 

0.388576227007 

0.000000012340 

-0.000000059598 

x: 

0.263888271986 

0.385290687187 

0.385290741920 

0.385290746785 

-0.000000004864 

-0.000000059597 

x: 

0.297033228393 

0.381725674206 

0.381725728512 

0.381725733800 

-0.000000005288 

-0.000000059593 

x: 

0.329159950016 

0.377905230684 

0.377905279398 

0.377905290287 

-0.000000010889 

-0.000000059603 

x: 

0.360453699018 

0.373849440845 

0.373849511147 

0.373849500448 

0.000000010698 

-0.000000059604 

x: 

0.391055896896 

0.369575196048 

0.369575262070 

0.369575255651 

0.000000006419 

-0.000000059603 

x: 

0.421076852419 

0.365097238417 

0.365097314119 

0.365097298012 

0.000000016107 

-0.000000059595 

x: 

0.450608489560 

0.360428130051 

0.360428184271 

0.360428189648 

-0.000000005377 

-0.000000059598 

x: 

0.479725761713 

0.355579258917 

0.355579316616 

0.355579318519 

-0.000000001902 

-0.000000059601 

x: 

1.010456600772 

0.239440565640 

0.239440649748 

0.239440625229 

0.000000024519 

-0.000000059589 

x: 

1.039409823988 

0.232439528403 

0.232439562678 

0.232439587993 

-0.000000025315 

-0.000000059590 

x: 

1.068692559293 

0.225374753587 

0.225374817848 

0.225374813189 

0.000000004659 

-0.000000059602 

x: 

1.098347233137 

0.218248244336 

0.218248322606 

0.218248303940 

0.000000018667 

-0.000000059604 

x: 

1.128421928829 

0.211061263284 

0.211061343551 

0.211061322887 

0.000000020664 

-0.000000059603 

x: 

1.158970386539 

0.203814501730 

0.203814581037 

0.203814561328 

0.000000019709 

-0.000000059597 

x: 

1.190056245939 

0.196507285750 

0.196507364511 

0.196507345350 

0.000000019161 

-0.000000059600 

x: 

1.221750217779 

0.189138515593 

0.189138561487 

0.189138575189 

-0.000000013702 

-0.000000059596 

x: 

1.254138569173 

0.181705027166 

0.181705087423 

0.181705086768 

0.000000000655 

-0.000000059602 

x: 

1.287320295169 

0.174202711977 

0.174202784896 

0.174202771576 

0.000000013320 

-0.000000059599 

x: 

1.321418432469 

0.166624545576 

0.166624620557 

0.166624605173 

0.000000015384 

-0.000000059598 

x: 

1.356585716292 

0.158960217743 

0.158960267901 

0.158960277339 

-0.000000009438 

-0.000000059596 

x: 

1.393018722519 

0.151194300960 

0.151194363832 

0.151194360555 

0.000000003277 

-0.000000059595 

x: 

1.414213562373 

0.146762652495 

0.146762669086 

0.146762663174 

0.000000005913 

-0.000000010679 

x: 

0.065896761049 

0.398076980336 

0.398077040911 

0.398077039931 

0.000000000979 

-0.000000059595 

x: 

0.113411555833 

0.396384819146 

0.396384894848 

0.396384878748 

0.000000016100 

-0.000000059601 

x: 

0.155068672183 

0.394174398848 

0.394174486399 

0.394174458446 

0.000000027952 

-0.000000059598 

x: 

0.193392483833 

0.391551192645 

0.391551256180 

0.391551252249 

0.000000003931 

-0.000000059604 

x: 

0.229466279456 

0.388576167409 

0.388576239347 

0.388576227007 

0.000000012340 

-0.000000059598 

x: 

0.263888271986 

0.385290687187 

0.385290741920 

0.385290746785 

-0.000000004864 

-0.000000059597 

x: 

0.297033228393 

0.381725674206 

0.381725728512 

0.381725733800 

-0.000000005288 

-0.000000059593 

x: 

0.329159950016 

0.377905230684 

0.377905279398 

0.377905290287 

-0.000000010889 

-0.000000059603 

x: 

0.360453699018 

0.373849440845 

0.373849511147 

0.373849500448 

0.000000010698 

-0.000000059604 

x: 

0.391055896896 

0.369575196048 

0.369575262070 

0.369575255651 

0.000000006419 

-0.000000059603 

2A-24  Accuracy 

0.000000059605 

Table  18. 


Comparison  between  SRC  macro  and  NFG; 


numeric  function 


1  C 


VItt 


N=1,000,000  and.?  =  2  24. 


2.  Fixed  Point  Implementation 

As  mentioned  before,  the  advantage  of  using  fixed  point  is  the  reduction  in 
hardware  and  the  reduced  pipeline  depth.  The  disadvantage  is  that  is  takes  more  work  to 
program. 


Macros  may  be  used  to  define  certain  behavior  that  is  easier  to  describe  in  HDL 
or  to  provide  special  functionality  that  is  not  available  in  regular  programming.  In  the 
NFG,  the  multiplier  is  limited  by  the  64  bit  architecture.  The  product  of  two  64  bit 
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numbers  does  not  give  the  user  access  to  all  128  bits  in  the  product.  HDL  can  be  used  to 
manipulate  and  access  the  desired  bits. 


a.  No  Macro  Multiplier  (non-uniform) 

The  fixed  point  implementation  without  a  macro  is  exactly  the  same  as  the 
fixed  point  implementation  with  only  one  exception;  the  indexing  in  non-uniform 
segmentation  is  accomplished  using  the  user  callable  macro,  priority  selector,  available  in 
the  SRC. 


###################################################################### 
##################  INNER  LOOP  SUMMARY  #################### 

loop  on  line  46: 

clocks  per  iteration:  1 

pipeline  depth:  28 

###################################################################### 
###############  PLACE  AND  ROUTE  SUMMARY  #################### 


Number 

of 

Slice  Flip  Flops: 

8,283 

out 

of 

67,584 

12 

Number 

of 

4  input  LUTs: 

12,331 

out 

of 

67,584 

18 

Number 

of 

occupied  Slices: 

11,256 

out 

of 

33,792 

33 

Number 

of 

MULTI  8X1 8s : 

30 

out 

of 

144 

20 

freq  =  100.2  MHz 

###################################################################### 


Table  19.  Pipeline  depth,  place  and  route  summary  for  yj-  ln(.t) ,  N=1,000,000  ands  =  2  24 

Non-uniform  segmentation  using  priority  selector  macro. 


b.  Macro  Multiplier  Implementation 

The  goal  is  to  build  a  multiplier  in  VHDL  or  Verilog  that  can  successfully 
multiply  in  two’s  complement  and  provide  a  result  that  is  already  shifted  into  the  number 
system  chosen  for  fixed  point.  Specifically,  we  want  a  product  that  is  32  bits  integer  and 
32  bits  fraction. 

Several  multipliers  were  built.  The  multipliers  function  correctly  in 
simulation  on  PC’s  using  Xilinx  ISE,  Project  Navigator  and  Modelsim  simulating 
software.  However,  when  the  VHDL  or  Verilog  files  were  compiled  on  the  SRC,  the 
products  were  not  correct.  This  version  was  implemented,  but  it  did  not  produce  correct 
products. 
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Appendix  B  shows  the  VHDL  code  for  a  32x32  bit  multiplier  with  a  32  bit 
product.  The  design  instantiates  the  18x18  signed  multiplier  primitive.  The  design 
makes  use  of  a  modified  I/O  pipeline  design  from  a  Xilinx  application  note  [22], 

Appendix  B  also  shows  the  Verilog  file  for  a  64x64  bit  multiplier  with  a 
64  bit  product.  The  64x64  bit  multiplier  makes  use  of  the  source  code  for  the  64x64  bit 
multiplier  macro  designed  by  SRC. 


C.  SOURCES  OF  ERROR 

The  floating  point  implementation  has  only  errors  associated  with  the  MATLAB 
computed  values  and  the  restrictions  placed  on  s.  When  implemented  in  the  SRC, 
double  precision  accurately  represents  what  is  expected  from  the  values  fed  into  the  NFG 
and  the  coefficients  table. 

The  fixed  point  implementation  had  errors  due  to  several  reasons.  We  explore 
some  of  those  reasons  for  error  in  the  NFG  as  a  whole. 

1.  Function  Approximation 

Both  floating  point  and  fixed  point  have  to  work  with  approximation  error.  This 
is  discussed  in  detail  in  section  II  B  (Segmentation). 

2.  Absence  of  Rounding  in  the  Multiplier 

The  fixed  point  implementation  of  the  NFG  shifts  binary  bits  and  truncates  lower 
and  upper  bits.  This  introduces  error  in  computing  the  products  and  these  errors 
propagate  to  the  final  answer. 


3.  Insufficient  Bits 

Insufficient  bits  to  represent  the  full  product  means  that  the  numbers  have  to  be 
shifted  and  truncated.  This  limits  the  ability  for  the  NFG. 
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D.  SUMMARY 

The  NFG  implementation  of  the  unifonn  segmentation  using  floating  point 
number  system  has  a  pipeline  depth  of  84  or  98  depending  on  whether  the  begin  point  of 
the  domain  interval  is  zero  or  non-zero  (zero  is  preffered).  This  implementation  must 
read  a  memory  file  containing  the  polynomial  coefficients  into  OBM.  Aside  from  these 
requirements,  the  NFG  implemented  in  uniform  segmentation  and  floating  point  number 
systems,  provides  advantages  over  using  the  available  user  callable  macros  and  the  math 
operators.  It  can  be  implemented  in  very  high  precision,  shorter  pipeline  depth  and  in 
some  cases  less  hardware. 

Another  advantage  of  the  uniform  segmentation  is  that  once  compiled,  the  NFG 
can  compute  any  of  the  15  functions.  The  memory  file  with  the  coefficients  must  be 
available. 

The  NFG  non-uniform  implementation  has  a  shorter  pipeline  depth,  but  requires 
much  hardware  to  implement  the  segment  index  encoder.  The  segment  index  encoder  is 
limited  to  approximately  150  segments  in  this  design.  Depending  on  the  function,  the 
precision  can  be  increased  as  long  as  the  number  of  segments  does  not  exceed 
approximately  150. 

The  fixed  point  implementation  requires  a  rounding  macro  and  a  good  macro 
multiplier  to  provide  the  desired  product  bits  and  make  it  effective.  However,  it  provides 
a  significantly  smaller  pipeline  depth  than  the  floating  point  implementation. 

A  real  advantage  of  the  NFG  is  when  very  complicated  numeric  functions  need  to 
be  implemented;  the  NFG  has  a  constant  pipeline  depth  unlike  the  more  complicated 
functions  that  have  long  pipeline  depths. 

More  research  is  required  to  realize  a  complete  NFG  design.  Section  VI  discusses 
some  suggestions  for  future  work. 
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VI.  CONCLUSION 


A.  SUMMARY  OF  WORK 

An  efficient  and  fast  segmentation  of  numeric  functions  was  accomplished  in 
MATLAB.  Table  20  shows  the  number  of  tests  (calls  to  chebyRemz )  required  to  segment 
the  suite  of  15  functions. 


Epsilon  =  0.0000000596  =  2A-24.0 

N  =  1000000 

Function 

Interval 

%Of  tests  #  of 

Segments 

2  Ax 

[0,1] 

0.00910 

35 

l./x 

[1,2] 

0.01020 

50 

sqrt (x) 

[1,2] 

0.00750 

24 

1/sqrt (x) 

[1,2] 

0.00720 

36 

log2 (x) 

[1,2] 

0.00900 

44 

log (x) 

[1,2] 

0.00780 

39 

sin (pi*x) 

[0,1/2] 

0.01990 

58 

cos (pi*x) 

[0,1/2] 

0.01740 

58 

tan (pi*x) 

[0,1/4] 

0.01240 

58 

sqrt ( -log (x . . . 

[1/512,1/4] 

0.04070 

163 

tan (pi*x) .  A  .  .  . 

[0,1/4] 

0.02180 

79 

-  (x*log2 (x)  . .  . 

[1/256, 1-1/256] 

0.04710 

183 

1/ ( 1+exp ( -x . . . 

[0,1] 

0.00920 

20 

(1/sqrt (2*p . . . 

[0, sqrt  (2) ] 

0.01670 

45 

sin (exp (x) ) 

[0,2] 

0.07810 

265 

•k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

Table  20.  Speed-up  in  computation  time  for  15  functions  (expressed  as  a  percentage  of  the 
time  needed  when  the  domain  is  divided  into  1,000,000  points)  for£  =  2  24 


The  NFG  circuit  built  in  the  SRC  was  very  effective  in  floating  point.  The 
computation  of  numeric  functions  in  the  NFG  was  shown  to  obtain  accuracy  of  up  to  33 
bits.  Higher  accuracy  is  possible  at  the  cost  of  increasing  the  size  of  the  memory  hies 
required  to  store  the  coefficients. 

Floating  point  implementation  was  easier  to  build  on  the  SRC  than  the  fixed  point 
implementation.  However,  floating  point  implementation  takes  up  a  large  amount  of 
FPGA  resources. 
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The  NFG  is  a  useful  technique  to  compute  complicated  numeric  functions  that 
would  otherwise  require  a  combination  of  several  other  arithmetic  operations.  The  more 
demanding  the  numeric  function  the  more  reason  to  use  the  NFG  instead.  The  NFG  is 
more  efficient  in  10  out  of  the  15  functions  that  were  investigated  in  this  thesis  (when 
using  the  non-unifonn  segmentation). 

The  fixed  point  implementation  did  not  produce  all  of  the  desired  results.  The 
multiplication  required  more  programming  than  the  floating  point  implementation 
required,  but  the  results  had  errors  due  to  rounding  and  truncating  the  intermediate  and 
final  results.  This  area  needs  more  research  to  improve.  The  advantage  of  fixed  point 
implementation  is  that  it  requires  much  less  hardware  than  floating  point  and  therefore 
can  reduce  the  pipeline  depth  to  about  30%  of  the  pipeline  depth  required  by  the  floating 
point  implementation. 


B.  SUGGESTED  FUTURE  WORK 

1.  Hybrid  of  Uniform  and  Non-Uniform  Segmentation 

Uniform  segmentation  is  much  faster  and  less  complicated  than  non-uniform 
segmentation.  Although  non-uniform  segmentation  may  not  be  practical  on  its  own,  a 
hybrid  of  non-uniform  and  uniform  segmentation  would  take  advantage  of  the  strengths 
of  each. 

Consider  a  numeric  function  that  is  not  suitable  for  uniform  segmentation,  such 
as  yf—  ln(x)  ,  which  appears  in  Figure  4  to  demonstrate  this  fact.  In  the  non-uniform 

segmentation  of  the  same  function;  such  as  Figure  2,  the  restricting  portion  is  the 
beginning  of  the  segment.  Therefore  to  capture  the  most  restricting  part  of  the  numeric 
function,  segment  the  numeric  function  into  a  few  non-uniform  segments. 

A  good  starting  point  is  to  determine  an  upper  limit  for  the  total  number  of 
constant  segments.  Let  us  decide  on  400  segments.  If  we  dedicate  100  constant 
segments  to  the  first  portion  of  the  numeric  function  yl~ln(x)  ,  then  change  the  segment 
size  for  another  100  constant  segments  and  repeat  this  process  four  or  five  times,  we  will 
have  five  non-uniform  segments  each  containing  a  set  of  uniform  segments. 
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This  method  would  provide  three  advantages: 

1 .  Relieve  the  segmentation  constraint  from  the  most  restricting  segment. 

2.  The  segment  index  encoder  would  be  small  (5  groups  of  segments)  and  save 
FPGA  space. 

3.  The  indexing  would  be  less  complex  once  the  input  has  been  mapped  to  the 
correct  group  of  segments. 


2.  Expand  the  Domain  of  the  NFG  via  Mapping 

The  functions  investigated  in  this  thesis  have  a  limited  domain  interval.  To  make 
the  functions  useful  for  a  wide  range  of  applications,  the  domain  interval  should  be 
increased.  Theoretical  research  is  being  conducted  in  this  field  [21]. 


3.  Build  an  HDL  Multiplier  Macro  and  Tap  of  Desired  Bits 

If  the  multiplier  in  fixed  point  were  built  in  a  macro,  the  desired  bits  could  be 
tapped  off.  This  implementation  would  be  both  fast  and  accurate. 


3.  Build  a  Rounding  Macro 

A  macro  can  be  built  to  round  off  shifted  values  in  the  fixed  point  implementation 
instead  of  truncating  the  values.  This  would  improve  the  accuracy  in  the  output  of  the 
products  and  the  final  result  of  the  NFG. 

4.  Efficient  Segment  Index  Encoder  vice  Priority  Selector  Macros 

The  priority  selectors  are  fast  and  work  well,  but  take  up  a  lot  of  hardware. 
Combined  with  the  other  hardware  in  the  NFG,  the  priority  selectors  take  up  all  the 
resources  and  limit  the  accuracy  and  flexibility  of  the  NFG  to  handle  all  the  functions. 
An  implementation  that  uses  a  more  efficient  method  for  the  segment  index  encoder 
would  benefit  the  NFG. 

Sasao,  Butler  have  three  suggestions;  (1)  LUT  cascade,  (2)  Content  addressable 
memory  and  (3)  EVBDD. 
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5.  Different  Architecture 

If  FPGA  resources  became  scarcer  and  one  wanted  to  implement  a  larger 
coefficients  table,  the  only  way  to  make  room  is  to  remove  the  major  consumers  of  real 
estate.  In  the  NFG,  it  would  be  the  segment  index  encoder  that  is  implemented  as  a 
priority  selector  macro  and  the  multipliers.  We  have  already  discussed  possible  solutions 
to  removing  large  selectors. 

Using  Florner’s  rule,  a  multiplier  can  be  eliminated  from  the  NFG.  Equation  (0.5) 
shows  how  to  apply  Homer’s  rule  to  the  NFG. 


2 

f{x)-c2x  +  c{x  +  c0  =  (c2x  +  c{)x  +  c0  (0.5) 

The  NFG  hardware  would  add  one  more  adder  stage,  however  if  the  segment 
index  encoder  were  able  to  work  in  one  or  two  clocks,  this  would  be  a  speed-up  from  the 
previous  architecture  as  long  as  the  adder  stages  take  fewer  clocks  than  the  multipliers. 
Floating  point  adders  can  take  as  many  clocks  as  the  multipliers,  but  in  two’s 
Complement  or  signed  magnitude,  the  adders  are  faster  than  the  multiplier. 

In  the  previous  architecture,  x 2  takes  many  more  clocks  than  the  segment  index 
encoder  and  adds  to  the  pipeline  depth. 

Figure  22  shows  an  overview  of  the  NFG  architecture  when  Horner’s  rule  has 
been  applied. 


76 


Input  X 


a 


Segment 

Encoder 


Coefficients  Table 


(CiX  +  C^-X 


f(x)  =  c,xl  +  C'X  +  c, 


Figure  22.  Horner’s  rule  NFG  architecture  overview. 
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APPENDIX  A.  MATLAB  ALGORITHMS 


The  following  MATLAB  Code  generates  the  segmentation  for  any  function; 
however  a  user  interface  has  been  added  for  convenience.  The  user  simply  picks  a 
number  instead  of  re-typing  the  entire  function  or  the  interval  for  evaluation.  The 
interface  limits  the  MATLAB  Code  to  the  suite  of  functions  found  in  Table  1. 

A.l  QUADRATIC  APPROXIMATION  USING  POLYFIT 

This  code  implements  the  quadratic  approximation  using  the  MATLAB  function 
Polyfit.  There  are  6  files  needed  to  run  the  non-unifonn  and  uniform  segmentation: 
QuadAppxPfit.m,  multipleQuadApprox.m,  varQuadApprox.m,  dec2binfp.m, 
constantQuadApprox.m,  and  constQuadAppxWErr.m. 

QuadAppxPfit.m  is  the  top  function  where  the  program  starts  and  ends.  All  the 
other  files  are  child  functions  that  provide  the  segmentation  data  back  to  this  file  for 
presentation  /  file  storage. 

multipleQuadApprox.m  calls  the  non-uniform  segmentation  algorithms  to  collect 
the  data  for  the  segment  endpoints  and  coefficients. 

varQuadApprox.m  tests  proposed  segments  and  reduces  finds  the  optimum  width 
of  the  segment  by  testing  the  approximation  error  to  e  . 

dec2binfp.m  is  the  file  that  converts  decimal  numbers  into  binary.  This  is  limited 
to  converting  one  integer  value  and  only  up  to  9  binary  bits  of  accuracy. 

constantQuadApprox.m  is  used  for  uniform  segmentation  when  the  number  of 
segment  is  known  before  hand.  The  key  requirement  is  to  input  the  number  of  segments 
desired,  the  approximation  error  is  unspecified. 

constQuadAppxWErr.m  needs  to  have  s  specified,  then  this  file  will  compute  the 
uniform  segmentation  of  the  numeric  function  that  meets  the  constraint  s . 
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FILE:  QuadAppxPf it .  m 


Arbitrary  PW  Quadratic  Approx. m 

Created:  January  6,  2006  (from  Arbitrary  PW  Linear  Approx. m) 

Last  modified:  October  20,  2006 
Produced  by:  Tom  Mack  &  Jon  Butler 

Modified  by:  Njuguna  Macaria  for  quadratic  approximation 

This  program  produces  a  segmentation  of  a  given  function  using  either: 

1.  Uniform  piecewise  Quadratic  approximation 

2.  Non-uniform  piecewise  Quadratic  approximation 

3 .  Both 

It  is  based  on  the  algorithm: 

1.  For  non-uniform,  the  MATLAB  polyfit  function 

2.  For  uniform,  dividing  the  range  of  the  input  into 

equal,  user-defined  segments 

or  by  using  max  error  to  determine  max  segment  length 
at  the  greatest  curvature  and  then  dividing  the  range 
up  into  equal  segments. 

All  with  intercept  shifting  to  balance  the  positive 
and  negative  error 


Inputs 

N 

f  (x) 
x  low 
x  high 
epsilon 
consegs 


number  of  elements  on  which  function  is  expressed 
function  to  be  evaluated 

low  end  of  interval  over  which  f (x)  is  evaluated 
high  end  of  interval  over  which  f (x)  is  evaluated 
precision  of  approximation  (for  variable  only) 
number  of  segments  to  use  to  approximate  (constant  only) 


%  Outputs 

%  Segment  info  -  Segment  #,  Begin  Pt,  End  Pt,  Coefficients,  &  Error 

%  Plot  showing  the  approximation 

%  Text  file  used  to  initialize  memory  in  SRC  (both  Binary  &  Decimal) 

"o  "o  "o  "o  "o  "o  "c 

clear 
close  all 
format  long  g 
fprintf ( ' \n ' 

fprintf ( ' \n************************************************************** 
fprintf ( ' \n ' 

fprintf ('\n  QUADRATIC  APPROXIMATION  OF  A  FUNCTION  USING  POLYFIT  ' 

fprintf ( ' \n ' 
fprintf ( ' \n ' 

%%  Get  FUNCTION  to  be  approximated  (user  input) 

func  =  input (  'Input  the  Function,  func [sqrt (-l*log (x) ) ] :  '  ,'s'); 

if  isempty (func) 


func  =  ' sqrt (-l*log (x) ) ' ; 


end 


%%  Get  LOW  range  (user  input) 
x  low  =  input (  ' Input  the  Lower  Range  of  x 
if  isempty (x  low) 

x  low  =  1/256;  %%  default 


LOW  value,  x(low) [1/256] 
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%%  Get  HIGH  range  (user  input) 

x  high  =  input (  'Input  the  Higher  Range  of  x  -  HIGH  value,  x(high) [1/4]:' 
if  isempty(x  high) 


x_high  =  1/4; 


default 


%%  Get  CONSTANT  OF  VARIABLE  segmentation  (User  input) 
vari_or_const  =  0; 

while  vari  or  const  ~=  1  &&  vari  or  const  ~=  2  &&  vari  or  const  ~=  3 
vari  or  const  =  ... 

input (  '( 1 ) Non-uniform  (2) Uniform  Segmentation  or  (3) Both  [ 1 ] : ' ) ; 

if  isempty(vari  or  const) 


vari  or  const  =  1; 


default  Non-uniform 


%%  If  non-uniform  segmentation,  then  enter  ERROR  parameters 
if  vari  or  const  ~=  2 

epsilon  =  input (  'Input  the  Desired  Error,  epsilon [0 . 0001] :  '); 

if  isempty (epsilon) 

epsilon  =  0.0001;  %%  default 

end 

end 

%%  If  uniform  segmentation,  find  how  the  user  will  restrict  #  of  segments 
if  vari  or  const  ==  2 
err_or  segs  =  ... 

input (  'Constrain  by  (1) Number  of  Segments  or  (2) Error  [1]:  '); 

if  isempty (err  or  segs) 

err_or_segs  =1;  %%  default 

end 

if  err  or  segs  ==  1 

consegs  =  input (  'Input  the  number  of  Desired  Segments [100] :  '); 

if  isempty (consegs) 

consegs  =  200;  %%  default 

end 

end 

if  err  or  segs  ==  2 

epsilon  =  input (  'Input  the  Desired  Error,  epsilon [0 . 0001] :  '); 

if  isempty (epsilon) 

epsilon  =  0.0001;  %%  default 

end 

end 

end 

N  =  input (  'Input  the  no.  of  pts  the  fct  is  to  be  evaluated;  N[10000] :  ' 

if  isempty (N) 

N  =  10000;  %%  default 


%  eqn  =  input (  'Input  the  equation  to  use: 

%  ( 1 )  F (x)  =axA2+bx+c  or  (2  )  F (x) =a (x-p) A2+b (x-p) +c,  [1]: 

%  if  isempty (eqn) 


%  eqn  =1;  %%  default 

%end 

eqn  =  1; 

%%%  Based  on  the  number  of  points  to  be  used  for  the  curve,  find  the 
%%%  x  values  to  calculate  and  spread  over  the  approximating  function 
N  =  N  *  (x  high  -  x  low) ; 
x  =  linspace (x  low,  x  high,  N) ; 


OOOOOOOOOOOOOOOOOOOOOOOOOO  I\l  W  J.  Hj  O  oooooooooooooooooooooooooooooooooooooooooo 

%  The  segments  in  this  program  do  NOT  overlap  (i.e.  the  first  element  of 

%  the  NEXT  segment  is  NOT  the  last  element  of  the  LAST  segment. 

S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


eval ( [ ' F  = 


func,  ' ; ' ] ) 


Evaluate  the  function  and  place  values  in  F 


%Print  demarcation  line 

fprintf (’ \n*  ***************************************************  * 
fprintf ( ' \n' ) 

Segmentation  Algorithm 
REPEAT  FOR  EACH  i 


•k'k'k'k'k'k'k'k'k  t 


repeat 
while  r 
if 

end 

if 


=  1; 

epeat  ==  1 

(mod (vari_or_const, 2 )  ==  1) 

[endpt, seg_end_point, c_2, c_l, c_0]  =  multipleQuadApprox (x,  F,  epsilon) ; 

(vari_or_const  ==  2 )  &&  (err_or_segs  ==  1) 

[endpt, seg_end_point, c_2, cl, c_0]  =  constantQuadApprox (x, F, consegs) ; 


end 

if 

end 


(  (vari__or_const  ==  2)  &&  (err_or_segs  ==  2 ) )  |  |  (vari_or_const  ==  4) 

[endpt, seg  end  point, c  2,c  l,c  0]  =  constQuadAppxWErr (x, F, epsilon) ; 


S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

%  Compute  and  plot  function,  approximate  function  and  error  % 

S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


ind 

for 


=  1; 
i  =  1: 

:  length 

m 

=  1; 

XP 

=  []; 

FP 

=  []; 

Error 

=  []; 

while 

(ind  < 

XP  (m) 

FNC  (m) 

FP  (m) 
Error (m) 
ind 
m 

end  %while 


(seg_end_point) ; 


seg_end_point (i)  ) 
=  x ( ind) ; 

=  F (ind) ; 

=  c_2 ( i ) * ( (x ( ind) 
=  FNC (m)  -  FP (m) ; 
=  ind  +  1; 

=  m  +  1  ; 


%  Index  for  each  segment 
%  Index  within  each  segment 

%  Actual  function  (Fct  No  correction) 
A2 ) +c_l ( i ) *x ( ind)  +  c_0(i);  %  Approx 


MaxError (i)  =  max (abs (Error)); 


%  Keep  track  of  all  errors 
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if  (mod(i,2)  ==  0) 


Plot  every  other  segment  a  different  color 
%%  Blue 


Blue 


Red 


Red 


figure (mod (vari  or  const, 2) +1)  %% 

plot (XP, FP) 

figure (mod (vari  or  const, 2) +3)  %% 

plot (XP, Error) 

else 

figure (mod (vari  or  const, 2) +1) 
plot(XP,FP, 'r', ' LineWidth ' , 2 )  %% 

figure (mod (vari^or_const,  2) +3) 
plot (XP, Error , 'r', ' LineWidth ', 2 )  %% 

end  %if  (mod(i,2)  ==  0) 
figure (mod (vari  or  const, 2) +1) 
hold  on 

xlabel ( ' x ' , ' Font Size ' , 10) 
ylabel ( ' f (x) '  , ' Font Size ' , 10) 
if  (mod (vari  or  const, 2)  ==  1) 

title ([' NON-UNIFORM  f (x)  segmentation. 
num2str (length (seg  end  point) ) , ' . ' 
elseif  (mod (vari_or_const, 2 )  ==  0) 

title ([' UNIFORM  f (x)  segmentation.  No. 
num2str (length (seg  end  point) ) , ' . ' 

end 

figure (mod (vari  or  const, 2) +3) 
hold  on 

xlabel ( ' x ' , ' Font Size ' , 14 ) 

%  Pick  the  maximum  error  from  all  the  segments 
ylabel ([' Error (x) .  Max  Error  =  ' , num2str (max (MaxError) ) , 
' FontSize ' , 10) 

if  (mod (vari  or  const, 2)  ==  1) 

title ([' Error  for  NON-UNIFORM  f (x)  segmentation.  No. 


No.  of  segments  = 
, ' FontSize ' , 10) 


of  segments  =  ' 
]  ,  ' FontSize ' , 10) 


of  segs  = 


num2str (length (seg_end_point) ),'.'],' FontSize ' , 10) 
elseif  (mod (vari_or_const, 2 )  ==  0) 

title ([' Error  for  UNIFORM  f (x)  segmentation.  No.  of  segs  =  ',. 
num2str (length (seg_end_point) ),'.'],' FontSize  '  ,  10) 

end 

end  %for  i  =  1: length (seg  endpt) 
figure (mod (vari  or_const, 2 ) +1 ) 

plot(x,F)  %  Plot  function  on  same  figure  as  piecewise  approximation 
stem (x (seg_end_point) , F (seg_end_point) ) 
hold  off 


S-S-S-S-S-S-S-S- 

oooooooo 


%  Convert 
%  integer 


Decimal  to  Binary  Conversion  Algorithm 
string  end  points,  c_l  and  c  0  into  a  binary  string  with  1 
bit  and  8  fraction  bits  and  print  results  table. 


S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


if  (mod(vari^or  const, 2)  ==  1) 

fprintf ( ' \n  NON-UNIFORM  Segmentation') 
elseif  (mod (vari_or_const, 2 )  ==  0) 

fprintf (' \n  UNIFORM  Segmentation') 

end 

if  eqn  ==  1 

c_2  '  ,  .  .  . 

c_0  ' ,  .  .  . 

(Decimal)  ' ,  .  .  . 


fprintf ( ' \n  Segment 
'  c_2 
' c_0 ' ) 

fprintf ( ' \n  Number 


End  Point  End  Point 
c_l  c_l 

(Decimal)  (Binary) 
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'  (Binary) 

' (Decimal) 


(Decimal)  (Binary) 


(Binary) ' ) 


for  i  =  1: length (seg  end  point) 


xbin(i)  =  dec2binfp (x ( seg  end  point (i) ) ) ; 

segment (i+1)  =  x ( seg_end_point ( i ) ) ; 

c_2bin(i)  =  dec2binfp (c_2 (i) ) ; 

c_lbin(i)  =  dec2binfp (c_l (i) ) ; 

c_0bin(i)  =  dec2binfp (c_0 (i) ) ; 

if  eqn  ==  1 

%  Print  Remaining  Results  Table 
fprintf ( ' \n  %3d  %8.6f 

' %10 . 5f  %019.9f  %10.5f 


x ( seg_end_point ( i ) ) ; 
dec2binfp (c_2 (i) ) ; 
dec2binfp (c_l (i) ) ; 
dec2binfp(c  0(i)); 


%  Used  in  next  program 


fprintf ( ' \n  %3d  %8.6f  %019.9f  %10.5f  %019.9f 

' %1 0 . 5f  %019.9f%10.5f  %019 . 9f ' ,  i-1, 

x ( seg_end_point ( i ) ) ,  xbin(i),  c_2(i),  c_2bin(i),  c_l(i),... 
c_lbin(i),  c_0(i),  c_0bin(i)) 
end  %  if  eqn  ==  1 

end  %for  i  =  1: length (seg  end  point) 


%019 . 9f  ' , . . . 


%  Create  text  file  of  Binary  values  to  initialize  memory 
memBin  =  [c  2bin  .*  10A9;  c  lbin  .*  10A9  ;  c  Obin  .*  1 0 A 9 ] 

fid  =  f open (' memory . mem w ') ; 

fprintf  (fid, ' \n%018 . Of %018 . Of %018 . Of ' , memBin) ; 

fclose  (fid) ; 


%  Memory  with 


%  Create  text  file  of  Decimal  Values  to  initialize  memory 
fid  =  f open (' memDEC . mem w ') ; 
format  long  g; 

fprintf (fid, ' %5d ' ,  length (seg  end  point));  %  Numbe 

memDEC  =  [segment (2 : end) ;  c  2;  c  1;  c  0] 
fprintf (fid, '\n%18.12f  %18.12f  %18.12f  %1 8 . 12f ', memDEC) ; 
fclose  (fid) ; 


%  Number  of  Segments 


^End  text  file  creation 


if  eqn  ==  2 


:%%  The  following  created  from:  Extract  PL  Params . 


%  This  program  extracts  from  the  segmentation  and  the  function, 
%  1.  Squared  term  coefficient 

%  2.  Linear  term  coefficient 

%  3.  Constant 


%  which  are  the  parameters  needed  to  store  in  the  coefficients 
%  memory.  It  produces  the  BINARY  values  of  these  parameters. 

O, 

o 

%  The  segmentation  occurs  as  a  vector  of  end  points. 


fprintf ( ' \n ' ) 


fprintf ( ' \n' ) 

segment (1)  =  0; 

for  i  =  1 : length (segment) 

seg  index (i)  =  floor (N*segment (i) / (x  high-x  low))+l; 
end  %for  i  =  1 : length (segment) 
seg_index; 

for  i  =  2 : length (segment) 


slope ( i — 1 ) 
intercept ( i — 1 ) 


error (i-1) 
intercept (i-1) 


=  (F (seg_index (i) -1)  -  F (seg_index (i-1) ) ) / . . . 

(x (seg_index (i) -1)  -  x (seg_index (i-1) ) ) ; 

=  F (seg_index (i) -1)  -  slope (i-1) *x (seg_index (i) -1) ; 

=  max(F(seg  index (i-1) : seg  index(i)-l)  ... 

-  (slope (i-1) . *x (seg_index (i-1) : seg_index (i) -1) . . . 

+  intercept (i-1)  )  ); 

=  min(F(seg  index (i-1) : seg  index(i)-l)  ... 

-  (slope (i-1) . *x (seg_index (i-1) : seg_index (i) -1) . . . 

+  intercept (i-1)  )  ); 

=  0.5* (a  +  b) ;  %YES,  it  is  a  +  b. 

=  intercept (i-1)  +  error (i-1)  +  slope (i-1) *segment (d 


segment (i- 


s_m_e(i-l)  =  segment (i)  -  segment (i-1) ; 

clx(i-l)  =  s_m_e (i-1) * slope (i-1) ; 

approx (i-1)  =  clx(i-l)  +  intercept (i-1) ; 

exact (i-1)  =  2 "segment (i) ;  %Exact  value  of 


segment . 


end  %for  i  =  2 : length (segment) 

fprintf (' \nDECIMAL  values  for  Approx  =  slope* (x  -  pivot)  +  intercept.') 
fprintf (' \nseg  no.  [s,  e]  slope  intercept  ',... 

'pivot  approx_error  e-s  (e-s)*slope  (e-s)',... 

' *slope+intercept  exact  f(x)\n') 
for  i  =  1 : length (segment) -1 

fprintf ( '%1. Of  [%8.6f  %8.6f]  %8.6f  %8.6f  %8.6f  %8.6f',... 

'%8.6f  %8.6f  %8.6f  %8.6f  \n',  i-1,  segment ( i ),.. . 

segment ( i+1 ) ,  slope (i),  intercept ( i ) ,  segment (i),  error (i) , . . . 
s_m_e(i),  clx(i),  approx (i),  exact (i) ) 
end  %for  i  =  1 : length (segment) -1 
%hold  on 

%plot (x ( 1 : N) , slope  ( 1 )  . *x (1 : N) + intercept ( 1 ) ) 

%Convert  s,  e,  slope,  intercept,  and  pivot  to  binary, 
fprintf ( ' \nBINARY  values ' ) 


f (x) \n ' ) 


fprintf (' \nseg  no. 
' approx  error 


slope  intercept ' ,  .  .  . 

(e-s)*slope  (e-s) *sl+intercept  exact 


for  i  =  1 : length (segment) 

digits  =  ceil (log2 (length (segment) -1) ) ; 

s_seg_no  =  dec2bin ( i-1 , digits ) ; 

s_s(i)  =  dec2binfp ( segment ( i )) ; 

s_e(i)  =  dec2binfp ( segment ( i+1 )) ; 

s_slope(i)  =  dec2binfp ( slope ( i )) ; 

s_intercept ( i )  =  dec2binfp ( intercept ( i )) ; 
if  error ( i )  <  0 ; 

error (i)  =  abs (error (i) ) ; 
end  %  if  error (i)  <  0; 
s  error  (i)  =  dec2binfp (error ( i )) ; 

s_s_m_e(i)  =  dec2binfp ( s_m_e ( i ) ) ; 
s  clx(i)  =  dec2binfp (clx (i) ) ; 


s_approx(i)  =  dec2binfp (approx (i) ) ; 
s_exact(i)  =  dec2binfp (exact (i) ) ; 

fprintf ( ' %s  [%10.8f  %10.8f]  %10.8f  %10.8f  %10.8f  %10.8f',... 

'%10.8f  %10.8f  %10.8f  \n',  s_seg_no,  s_s(i),  s_e(i),  ... 

s_slope (i) ,  s_intercept (i) , s_error (i) , s_s_m_e (i) ,  ... 

s_clx(i),  s_approx(i),  s_exact(i)) 
end  %for  i 
end  %  if  eqn  ==  2 

S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

fprintf ( ' \n  '  ) 

fprintf ( ' \n**************************************************************  '  ) 
fprintf ( ' \n ' ) 
if  vari  or  const  ~=  3 
repeat  =  0; 

end 

if  vari  or  const  ==  3 
vari  or  const  =  4; 

end 

end  %  while  repeat  =  1 
%  End  file:  QuadAppxPf it .m 
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FILE:  multipleQuadApprox . m 

function  [endpt, indx, c2 , cl , cO ]  =  multipleQuAdapprox (x, fct, max  error) 

o  oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

%  This  function  will  produce  multiple  Quadratic-line  approximations  of  a 
%  given  function  to  within  the  bounds  of  max  error  provided. 

%  Created  by  Tom  Mack  for  linear  approximations 
%  Created:  Mar  31,  2006 

o, 

o 

%  Modified  for  Quadratic  approximations  by  Njuguna  Macaria 
%  Modified:  Dec  30,  2006 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

i  =  l; 

indx  =  1; 
seg_no  =  1; 
endpt  =  [  ]  ; 
c2  =  [  ]  ; 

cl  =  []; 

c0  =  [  ]  ; 

while  i  <  length (fct) 

[endpt (seg_no) , indx (seg_no) , c2 (seg_no)  ,  cl (seg_no) , cO (seg^no) ] 

varQuadApprox (x, fct,  max  error,  i)  ; 
i  =  indx(seg  no)  +  1; 

seg_no  =  seg_no  +  1; 

end 
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FILE:  varQuadApprox .  m 

function  [endpt, i, c2, cl, cO]  =  varQuadApprox (x, fct, max  error, indx) 

%  This  function  creates  a  2nd  Order  approximation  of  a  given  function 
%  using  the  polyfit  function.  It  continues  to  calculate  polyfits  until 
%  maximum  error  is  exceeded. 

%  Linear  approximation  Created  by  Tom  Mack  >>  Mar  31,  2006 

O, 

o 

%  Modified  for  Quadratic  approximation  by  Njuguna  Macaria 
%  Modified:  Dec  29,  2006 


for  i=indx: length (fct) ; 

p  =  polyf it (x ( indx : i ) , f ct ( indx : i )  ,  2 ) 
c_2  ( i )  =  p  ( 1 )  ; 

c _ 1  (i)  =  p  ( 2  )  ; 

c  0  ( i )  =  p  ( 3 )  ; 


%  Fit  equ  to  2nd  order  poly 
%  Coefficient  of  XA2 
%  Coefficient  of  X 
%  Intercept  of  polynomial 


approx ( indx : i )  =  p ( 1 ) * (x ( indx : i ) ) . A2  +  p (2) *x (indx: i)  +  p(3); 
errors  =  approx ( indx : i )  -  fct (indx :i); 

maxposerror  =  max (errors); 

maxnegerror  =  min (errors); 

%  c_0delta(i)  =  abs ( (abs (maxposerror)  -  abs (maxnegerror )) /2 ) ; 

%  If  the  negative  error  is  bigger,  then  the  delta  should  be  negative 
if  abs (maxnegerror)  >  abs (maxposerror) 
c_0delta(i)=  -1  *  c_0delta(i); 
end  %  if 


approx ( indx : i )  =  approx ( indx : i ) 
errors  =  approx ( indx : i ) 

=  max (abs (errors) ) ; 


-  c_0delta ( i ) ; 

-  fct ( indx : i ) ; 


%  If  exceeded  the  max  error,  then  go  back  to  the  previous  endpoint 
if  error  >  max  error 
i  =  i-1; 

endpt  =  x ( i ) ; 

c2  =  c_2  (i)  ; 

cl  =  c_l (i) ; 

cO  =  c_0  ( i )  ; 

return 

end  %  if  error  >  max 
end  %  for  i=indx+l : length (fct) 


endpt 

c2 

cl 

cO 


x  ( i )  ; 
c_2 ( i ) ; 
c_l ( i ) ; 
c  0  (i)  ; 


Removed  i  =  i-1; 


[»I»] 

Ml] 


FILE:  deczbinfp.m 
function  [binfp]  =  dec2binfp (x, n) 

%  This  function  converts  a  decimal  number  to  a  fixed  point  binary  number 
%  with  one  integer  followed  by  n  points  to  the  right  of  the  decimal 

O, 

o 

%  Created  by  Tom  Mack 
%  Last  modified:  August  22,  2006 

o, 

o 

%  Inputs 

%  x  =  decimal  number  to  be  converted  (does  not  have  to  be  an  integer) 

%  n  (optional,  default  9)  =  bit  resolution  to  the  right  and  left  of  decimal  pt 
%  Outputs 

%  binfp  =  binary  floating  point  representation 
%  Negative  inputs  are  output  in  18-bit  (9.9)  format 


_f  nargin  <2,  n  =  9;  end 
_f  isnan (x)  ==  1 , 
binfp  =  NaN; 
return 

slseif  x  ==  Inf 
binfp  =  Inf; 
return 

slseif  x  <  0, 

x  =  (x  *  2An)  +  2  A  ( 2  *  n )  ; 

x  =  dec2bin (x, 1 8 ) ; 

x  =  str2double (x) ; 

x  =  x  /  10An; 

binfp  =  x; 

return 

slse 

x  =  x  *  2An; 
x  =  dec2bin (x, 1 8 ) ; 
x  =  str2double (x) ; 
x  =  x  /  10An; 
binfp  =  x; 


FILE:  constQuadAppxWErr .m 

function  [endpt, indx, c2 , cl , cO ]  =  constQuadAppxWErr (x, fct, max  error) 

o  ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

%  This  function  will  produce  multiple  Quadratic-line  approximations  of  a 
%  constant  size  of  a  given  function  to  within  the  bounds  of  the 
%  max  error  provided.  Coefficients  &  intercept  calculated  using  polyfit. 

%  Intercept  adjusted  to  balance  max  positive  and  negative  errors. 

%  Created  by  Tom  Mack  for  linear  approximations 
%  Created:  July  10,  2006 
%  Modified:  July  11,  2006 

%  Modified  again  by  Njuguna  Macaria  for  Quadratic  approximations 
%  Modified:  Dec  30,  2006 

o, 

o 

%  Compute  #  of  segs 

9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9' 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

f irstderiv 
secndderiv 
[dermax, i ] 
error 
loop_stop 
i  low 

if  i  low  <=  0 
i  low  =  1; 

end 

i  high  =  i  +  1; 

if  i  high  >  length (fct) 

i  high  =  length (fct); 

end 

%  If  error  is  too  small,  increase  until  just  under  the  max  error 
%  This  gives  the  max  size  of  the  segment  within  the  desired  error 
while  error  <  max  error  | |  loop^stop  <  length (fct) 
i  low  =  i  low  -  1; 
if  i  low  <=  0 
i  low  =  1; 

end 

i  high  =  i  high  +  1; 
if  i  high  >  length (fct) 

i  high  =  length (fct); 

end 

%  Get  coefficients,  approximate  function  and  find  error 
%  Adjust  function  based  on  the  error  (move  it  up  or  down) 
p  =  polyfit(x(i  low:i  high), fct (i  low:i  high), 2); 

approx(i  low:i  high)  =  p(l)*(x(i  low:i  high)).A2  +  ... 

p (2) *x (i_low: i_high)  +  p(3); 

errors  =  approx (i  low:i  high)  -  fct(i  low:i  high); 

maxposerror  =  max (errors); 

maxnegerror  =  min (errors); 

_ c  Odelta _ =  abs  (  (abs  (maxposerror)  -  abs  (maxnegerror )) /2 )  ; 


=  diff (fct) . /diff (x) ; 

=  diff ( f irstderiv) . /dif f (x ( 1 : length ( f irstderiv) ) ) ; 
=  max (abs (secndderiv) )  ; 

=  0; 

=  0; 

=  i  -  1; 
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%  Figure  out  if  the  error  is  posivive  or  negative  and  move  the  function 
%  to  compensate  and  balance  the  error  of  the  approximated  function 
if  abs (maxnegerror )  >  abs (maxposerror ) 

c_0delta  =  -1  *  c_0delta; 
end  %  if 

%  Re-check  the  error  and  find  the  max  error 
approx (i  low:i  high)  =  approx (i  low:i  high) 
errors  =  approx (i  low:i  high) 

error  =  max (abs (errors) )  ; 

%  If  error  is  larger  than  should  be 
if  error  >  max  error 

i  low  =  i  low  +  1; 
i  high  =  i  high  -1; 

end 

loop_stop  =  loop_stop  +  1; 

end 

segsize  =  i  high  -  i  low; 
consegs  =  ceil  (length (fct) /segsize) ; 

Determine  Coefficients  of  segments 
idx=l  ; 

for  i  =  1: consegs 

indx(i)  =  round ( (length (x) /consegs) *i) ; 
if  indx(i)  ==  0 
indx(i)  =  1; 

end 

if  i==consegs 

indx(i)  =  length (x); 

end 

endpt(i)  =  x(indx(i)); 

p  =  polyf it (x (idx : indx (i) ) , f ct (idx : indx (i) ) , 2 ) ; 

approx ( idx : indx ( i ) )  =  p ( 1 ) * (x ( idx : indx (i))).A2  +  ... 

p (2) *x (idx: indx (i) )  +  p(3); 

errors  =  approx ( idx : indx ( i ) )  -  fct (idx : indx (i) ) ; 

maxposerror  =  max (errors); 

maxnegerror  =  min (errors); 

c_0delta  =  abs (abs (maxposerror)  -  abs (maxnegerror) ) /2; 

if  abs (maxnegerror)  >  abs (maxposerror) 

c_0delta  =  -1  *  c_0delta; 
end  %  if 
c2  (i)  =  p  (1)  ; 
cl  (i)  =  p  (2)  ; 

cO(i)  =  p(3)-  c_0delta;  %  Constant  shift  to  balance  pos  &  neg  error 
idx  =  indx(i)+l; 

=  i  +  1  ; 


-  c_0delta; 

-  fct(i  low:i  high) ; 


l 


FILE:  constantQuadApprox . m 


function  [endpt, indx, c2, cl, cO]  =  constantQuadApprox (x, fct, constsegs) 

O, 

o 

%  This  function  will  produce  multiple  Quadratic  line  approximations  of  a 
%  given  function  to  within  the  bounds  of  the  number  of  segments  provided. 
%  Coefficients  calculated  by  polyfit.  Intercept  adjusted  to  balance 
%  maximum  positive  and  negative  errors. 

O, 

o 

%  Created  by  Tom  Mack  for  linear  approximations 
%  Created:  June  4,  2006 

%  Modified  for  Quadratic  approximations  by  Njuguna  Macaria 
%  Modified:  July  11,  2006 

o, 

o 


idx=l  ; 


for  i  =  l:constsegs 

indx(i)  =  round ( (length (x) /constsegs) *i) ; 
if  i==constsegs 

indx(i)  =  length (x); 

end 

endpt (i)  =  x(indx(i)); 

p  =  polyfit (x (idx: indx (i) ) , fct (idx: indx  (i) ) , 2) ; 


approx ( idx : indx ( i ) ) 
errors 
maxposerror 
maxnegerror 
c  Odelta 


p (1) * (x (idx: indx (i) ) ) . A2+p (2) *x (idx: indx (i) ) +p (3) 
approx ( idx : indx ( i ) )  -  fct (idx : indx (i) ) ; 

max (errors) ; 
min (errors ) ; 

abs ( (abs (maxposerror)  -  abs (maxnegerror )) /2 ) ; 


if  abs (maxnegerror)  >  abs (maxposerror) 
c_0delta  =  -1  *  c_0delta; 
end  %  if 
c2  ( i )  =  p  ( 1 )  ; 
cl  (i)  =  p  (2)  ; 

c0(i)  =  p(3)-  c_0delta;  %  Intercept  shift  to  balance  pos  &  neg  error 
idx  =  indx(i)+l; 

=  i+1  ; 


i 


A.2  QUADRATIC  APPROXIMATION  USING  REMEZ  ALGORITHM 

The  thesis  was  designed  using  the  Remez  algorithm.  The  following  files  were 
developed  to  compute  he  segmentation.  The  top  level  fde  is  QuadAppxRemz.m,  which 
calls  a  set  of  user  written  MATLAB  functions  to  display  and  request  the  user  input 
(U ser Input. m),  obtain  the  numeric  functions  selected  by  the  user  and  their  respective 
domain  intervals  ( getF.m )  and  then  compute  the  segmentation. 

Non-uniform  segmentation  was  perfonned  by  multipleQuadApprox.m  in 
conjunction  with  varQuadApproxHyb3AvgThird.m  and  chebyRemz.m.  chebyRemz.m 
takes  place  of  Polijit.m  that  is  an  optimized  user  callable  MATLAB  function  shown  in 
A.l  above. 

Uniform  segmentation  is  performed  by  two  other  files.  If  the  number  of  segments 
is  known  without  explicit  input  of  e  ,  then  constantQuadApprox.m  is  the  fde  that  is  used. 
If  on  the  other  hand,  £  is  defined  and  uniform  segmentation  is  desired,  then 
constQuadAppxWErr.m  is  the  fde  that  is  used. 

The  fde  twosComp.m  was  developed  to  convert  the  data  to  a  two’s  complement, 
fixed  point  binary,  hexadecimal  or  decimal  number.  Note  the  two’s  complement  decimal 
number  is  not  the  same  as  a  float  or  double  data  type. 
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FILE:  QuadAppxRemz . m 


QuadAppxRemz . m 
Created : 
Created  by: 
Last  modified: 
Modified  by: 


January  6,  2007 
Njuguna  Macaria 
Auguse  3,  2007 
Njuguna  Macaria 


This  program  produces  a  segmentation  of  a  given  function  using  either: 

1.  Uniform  Quadratic  approximation 

2.  Non-uniform  piecewise  Quadratic  approximation 

3 .  Both 

It  is  based  on  the  algorithm: 

1.  For  non-uniform,  the  MATLAB  Remez  algorithm 

2.  For  uniform,  dividing  the  range  of  the  input  into 

equal,  user-defined  segments 

or  by  using  max  error  to  determine  max  segment  length 
at  the  greatest  curvature  and  then  dividing  the  range 
up  into  equal  segments. 


Inputs 


5  eqn 
b  x  low 
b  x  high 
b  func (x) 
b  epsilon 
b  consegs 
b  err_or_segs 
^vari  or  const 


Inputs  are  taken  from  an  input  function;  "userlnput ( ) ; " 

N  -  number  of  elements  on  which  function  is  expressed 

( 1 ) F (x) =axA2+bx+c  OR  PIVOT:  (2 ) F (x) =a (x-p) A2+b (x-p) +c 
low  end  of  interval  over  which  f (x)  is  evaluated 
high  end  of  interval  over  which  f (x)  is  evaluated 
function  to  be  evaluated 

precision  of  approximation  (for  variable  only) 
number  of  segments  to  use  to  approximate  (constant  only) 
Constant  segmentation;  decide  #  of  segments  or  err  bound 
Variable  or  constant  segmentation 


Outputs 

Segment  info  -  Segment  #,  Begin  Pt,  End  Pt,  Coefficients,  &  Error 
Plot  showing  the  approximation 

Text  file  used  to  initialize  memory  in  SRC  (both  Binary  &  Decimal) 


INPUT  OF  USER-SPECIFIED  PARAMETERS 


clear 

clc 

close  all 
format  long  g; 


%  Get  user  input 

%  profile  on  %  For  use  when  debugging.  Find  runtimes 

sel  =  Userlnput (); 

[f , interval, vari_or_const, err_or_segs, consegs, epsilon, N] =getF (sel) ; 

%%%  Based  on  the  number  of  points  to  be  used  for  the  curve,  find  the 
%%%  x  values  to  calculate  and  spread  over  the  approximating  function 

syms  x 
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eval ( [ ' func 
eval ( [ ' intv 
x_pts 
vecFunc 
y_actual 


' ,  f ,  ' ; '  ] ) 

' ,  interval ,  ' ;  '  ] ) 

linspace ( intv ( 1 ) ,  intv(2),  N) ; 

inline (vectorize (func) ) ;  %Vectorized  version  of  func. 
vecFunc (x  pts);  %Evaluate  the  function  with  x  pts 


9'9'9'2'2'2'9'2'2'2'9'2'2'2'9'2'2'2'2'2'2'2'9'2'2'9'2'2'2'2'9'2'2'2'9'  ‘NTHTT?  Q 

OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO  INV^XHiO  ooooooooooooooooooooooooooooooooo 


o, 

o 

o, 

o 

o, 

o 

o, 

o 


The  segments  in  this  program  overlap  (i.e.  the  first  element  of 
the  NEXT  segment  IS  the  last  element  of  the  LAST  segment. 


S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


%Print  demarcation  line 

fprintf (' \n**************** ********************************** ******\n ' ) 
fprintf ( ' \n ' ) 

2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2-  Qomnonfat  i  nn  Slmrithm  2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 2- 
oooooooooooooooooooooooo  o  c  y  ILLfcr  1 1  U  cL  L.  _L(Ji  1  r^._L  y  U  i.  _L  U111LL  ooooooooooooooooooooooooo 

repeat  =  1; 
while  repeat  ==  1 

if  (mod(vari  or  const, 2)  ==  1) 

[endpt, seg_end_point, c_2, c_l, c_0]  =  ... 

multipleQuadApprox (x  pts, func, epsilon) ; 

end 

if  (vari_or_const  ==  2)  &&  (err_or_segs  ==  1) 

[endpt, seg_end_point,c_2, c_l, c_0]  =  ... 

constantQuadApprox (x_pts, vecFunc, consegs) ; 

end 

if  ( (vari_or_const  ==  2)  &&  (err_or_segs  ==  2 ) )  | |  (vari_or_const  ==  4) 

[endpt, seg_end_point,c_2, c_l,c_0]  =  ... 

constQuadAppxWErr (x  pts, func, epsilon) ; 

end 

fprintf ( ' \n********************************************************\n' ) 
fprintf (' \n\nBack  from  all  the  Segmentation\n\n ' ) 

fprintf  (’ \n****************** ******************************** ******\n' ) 


S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

%  Compute  and  plot  function,  approximate  function  and  error  % 

S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

for  i  =  1: length (seg  end  point) -1; 

%  looking  at  each  segment  find  the  approximate  and  actual  points 
XP  =  x_pts ( seg_end_point ( i ) : seg_end_point ( i+1 ) ) ; 

c  =  [ c_2 ( i ) , c_l ( i ) , c_0 ( i ) ] ; 

FNC  =  vecFunc (XP); 

FP  =  polyval (c, XP) ; 

Error  =  FP  -  FNC; 

MaxError (i)  =  max (abs (Error) ) ; 

if  (mod (i, 100) ==0)  %  Only  used  when  trying  to  limit  graphing 

if  (mod(i,2)  ==  0)  %  Plot  every  other  segment  a  different  color 

figure (mod (vari  or_const, 2 ) +1 )  %%  Blue 

plot (XP, FP) 

figure (mod (vari  or  const, 2) +3)  %%  Blue 

plot (XP, Error) 
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o\°  o\°  o\° 


figure  (mod  (van  or  const,  2) +1) 
plot (XP, FP, ' r ' , ' LineWidth ' , 2 )  %%  Re< 

figure (mod (vari  or_const, 2) +3) 
plot (XP, Error r LineWidth 2 )  %%  Re< 

end  %if  (mod(i,2)  ==  0) 
figure (mod (vari  or  const, 2) +1) 
hold  on 

xlabel ( ' x ' , ' Font Size ' , 10) 
ylabel ( ' f (x) ' , ' Font Size ' , 10) 
if  (mod (vari  or  const, 2)  ==  1) 

title  ([  'NON-UNIFORM  f (x)  =  ', f,  .. . 

'  segmentation.  No.  of  ',... 

' segments  =  ' , . . . 

num2str (length (seg  end  point) -1), 

' FontSize ' , 10) 

elseif  (mod (vari_or_const,  2 )  ==  0) 
title  ([  'UNIFORM  f(x)  =  ',f,  ... 

'segmentation.  No.  of  segments  = 
num2str (length (seg  end  point) -1), 

' FontSize ' , 10) 


]  ,  .  .  . 


]  ,  .  .  . 


figure (mod (vari  or  const, 2) +3) 
hold  on 

xlabel ( ' x ' , ' FontSize ' , 14 ) 
errPwr2  =  log2 (max (MaxError )) ; 

ylabel ( [ ' Max  Error  =  ' , num2Str (max (MaxError) ),'  =  2\A',... 

num2str (errPwr2) FontSize ' , 10) 
if  (mod (vari  or_const,2)  ==  1) 

title  (  [  ' Error  for  NON-UNIFORM  f (x)  =  ' , f ,  ... 

'  segmentation.  No.  of  segs  =  ',... 
num2str (length (seg  end  point) -1) . 

' FontSize ' , 10) 

elseif  (mod (vari_or_const, 2 )  ==  0) 

title ( [  ' Error  for  UNIFORM  f (x)  =  '  ,  f ,  ... 

'  segmentation.  No.  of  segs  =  ',... 
num2str (length (seg  end  point) -1) . 

' FontSize ' , 10) 

end 

end  %  if  (mod (i, 100) ==0)  Graphing  STOP/START 
end  %for  i  =  1: length (seg  endpt) 
figure (mod (vari  or  const, 2) +1) 

plot (x  pts , y^actual )  %  Plot  func  on  same  fig  as  piecewise  approx 

stem(x_pts (seg_end_point) , y_actual (seg_end_point) ) 
hold  off 


Decimal  to  Binary  Conversion  Algorithm 


Print  whether  Uniform  or  Non-uniform  % 

_ O, 

- 0 

if  (mod (vari  or  const, 2)  ==  1) 

fprintf ( ' \n  NON-UNIFORM  Segmentation' 

elseif  (mod (vari_or_const, 2 )  ==  0) 

fprintf (' \n  UNIFORM  Segmentation') 


o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\°  o\° 


%  Convert  to  Twos  Complement  (32.32) 
%  and  save  in  a  file. 


fractLen  =  32; 
intLen  =  64-fractLen; 


32  bits  to  represent  the  fraction 
32  bits  to  represent  the  integer 


%  Convert  to  Twos  Complement  (16.16) 
%  and  save  in  a  file. 


fractLen  =  16; 

intLen  =  32  -  fractLen; 


16  bits  to  represent  the  fraction 
16  bits  to  represent  the  integer 


BINARY  FILE 


%  Create  text  file  of  Binary  values  to  initialize  memory 
fid  =  f open (' memBIN .mem w ') ; 

fprintf ( f id, ' %d ' ,  length (seg  end  point) -1);  %  Number  of  Segments 

%  Convert  the  values  to  binary  and  save  in  the  file 
for  i  =  1: length (seg  end  point) -1 

xbin(i,:)  =  twosComp(x  pts(seg  end  point (i+1) ), intLen,  fractLen); 
segmnt(i)  =  x  pts(seg  end  point (i+1));  %  Used  in  next  program 

c  2bin(i,:)  =  twosComp(c  2 (i), intLen,  fractLen); 
c  lbin(i,:)  =  twosComp(c  1 (i) , intLen,  fractLen); 
c_0bin(i,:)  =  twosComp (c_0 (i) , intLen,  fractLen); 
memBin  =  [  xbin (i, : ) , '  ' , c_2bin (i, : ) , '  ' , . . . 

c_lbin (i, : ) , '  ' , c_0bin (i, : ) ] ; 

fprintf  (fid, ' \n%s ', memBin) ; 

fprintf  (fid, ' \n',  xbin(i,:),'  ' , c_2bin ( i ,:),.. . 

'  ' , c_lbin ( i , : ) , '  ' , c_0bin ( i , : ) ) ; 

end  %for  i  =  1: length (seg  end  point) 

fclose  (fid) ; 


%  HEXADEDICMAL  FILE 


%  Create  text  file  of  Binary  values  to  initialize  memory 
fid  =  f open (' memHEXOx . mem w ') ; 

Num  of  Segments  =  length (seg  end  point) -1; 

fprintf (fid, ' %6d',  Num  of  Segments);  %  Number  of  Segments 


%  for  uniform  segmentation,  store  a  step  size 
if  (vari_or  const  ==  2)  |  (vari^or  const  ==  4) 

step  len  =  Num_of  Segments/ (intv(2)  -  intv(l));  % 
fprintf ( fid, ' \nOx%s ' ,  twosComp ( step  len, intLen,  fractLen)); 


%  Convert  the  values  to  binary  and  save  in  the  file 
for  i  =  1: length (seg  end  point) -1 

xbin(i,:)  =  twosComp(x  pts(seg  end  point (i+1) ), intLen,  fractLen) 
segmnt(i)  =  x  pts(seg  end  point (i+1));  %  Used  in  next  program 

c  2bin(i,:)  =  twosComp(c  2 (i), intLen,  fractLen); 
c  lbin(i,:)  =  twosComp(c  1 (i) , intLen,  fractLen); 
c_0bin(i,:)  =  twosComp (c_0 (i) , intLen,  fractLen); 
memBin  =  [ [ ' Ox ' , xbin ( i ,  : ) ] ,  '  ' ,  .  .  . 

[ ' Ox ' , c_2bin ( i , : ) ] , ' 

['Ox',c_lbin(i,  :)],  ' 

[ ' Ox ' , c_0bin ( i , : ) ]  ] ; 

fprintf  (fid, ' \n%s ', memBin) ; 

fprintf  ( fid, ' \n ', xbin ( i ,:), '  ' , c_2bin ( i ,:),.. . 

'  ' , c_lbin (i, : ) , '  ' , c_0bin (i, : ) ) ; 

end  %for  i  =  1; length (seg  end  point) 

fclose  (fid) ; 


%  DECIMAL  FILE 


%  Create  text  file  of  Decimal  Values  to  initialize  memory 
fid  =  fopen ( ' memDEC . mem ' , ' w' ) ; 

fprintf (fid, ' %6d',  Num  of  Segments);  %  Number  of  Segments 

%  for  uniform  segmentation,  store  a  step  size 
if  (vari  or  const  ==  2)  | |  (vari  or  const  ==  4) 

step  len  =  Num  of  Segments/ (intv (2)  -  intv(l));  % 
fprintf (fid, ' \n%26 . 18f ' ,  step  len);  %  Step  size  in  Decimal 

end 

memDEC  =  [ segmnt ( 1 : end) ;  c  2;  c  1;  c  0] 
maxCoef  =  max (memDEC); 
minCoef  =  min (memDEC) ; 

fprintf (fid, ' \n%26.18f  %26.18f  %26.18f  %2 6 . 1 8f ', memDEC) ; 
fclose  (fid) ; 

%End  text  file  creation 


fprintf ( ' \n ' ) 

fprintf ('\n********************************************************\n' 
if  vari  or  const  ~=  3 
repeat  =  0; 

end 

if  vari  or  const  ==  3 
vari  or  const  =  4; 

end 

%  %  %  profile  viewer 

%  %  pr  =  prof ile (' inf o ') ; 

%  %  prof save (pr, ' prof ile_results ' ) 

end  %  End  while  repeat  ==  1 

%  %  %  maxCoef  =  max (maxCoef)  %  for  debugging  to  find  number  range 

%  %  %  minCoef  =  min (minCoef) 

%  End  file:  QuadAppxRemz .m 


A.2.1  Remez  Algorithm  With  Chebyshev  Initial  Points 


FILE:  chebyRemz.m 


function  [poly  coeff,  oscil,  snd  Err]  =  chebyRemz (fun, interval, order) 


chebyRemz . m 

Get  chebyshev  polynomial  on  the  first  iteration.  Repeat  for  Remez 
application;  User  specifies  the  fuction  to  approxiamte. 

This  programs  turns  the  function  provided  into  an  inline  function. 


INPUT: 


order : 
interval : 


OUTPUT: 


errRemz : 
poly_coef f : 

oscil : 


function  entered  by  user  (want  to  approximate  this) 
However  this  function  cannot  be  a  constant,  f  must 
be  only  one  variable.  Must  use  the  variable  'x'. 
order  of  approximation,  e.g.  2nd  order  polynomial 
range  on  which  to  get  the  coefficients  will  be 
approximated  on  the  users  function. 

error  points  for  the  range  given 

These  are  the  coefficients  of  the  polynomial  that 
approximates  the  function. 

Oscillations  on  interval,  for  second  order  poly,  we 
want  only  2  oscillations.  In  this  case  oscillations 
are  the  zeroes  of  the  first  derivative. 


Author : 
Created : 


Njuguna  Macaria 
20  February  2007 


Last  Modified:  26  MARCH  2007 


o _ o 

O - o 

a  =  interval ( 1 ) ; 

b  =  interval (2); 

N  =  500;  %  Number  of  elements  per  segment 

x  pts  =  linspace (a, b, N) ;  %  x  axis  sample  points 

y_act  =  fun(x_pts);  %  Evaluate  actual  function 

eps  =  ( -1 ) . A [ 0 : order+1 ] ;  %  Epsilon  for  coefficients  calculation 

p  track  =  [];  %  For  tracking  result  with  error 

O _ O, 

Q - O 


%  Estimate  with  Polyfit  and  get  data 


%  %  %  %  pp  =  polyfit (x_pts, y_act, order) 

%  %  %  %  y_pfit  =  polyval (pp, x_pts ) ; 

coefficients 

%  %  %  %  errPfit  =  y_pfit  -  y_act; 
compare 


%  get  polyfit  coefficients 

%  evaluate  with  polyfit 

%  get  polyfit  error  values  to 


o, 

o 

o, 

o 

o, 

o 


Repeat  Powers  of  the  polynomial  in 
in  (order  +2)  rows  and  get  the 


O, 

o 

o, 

o 

o, 

o 
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%  initial  x  points 


o _ g, 

O - o 

set  =  ones (order+2 , 1 )*([ 0 : order+1 ])  ; 

xi  =  (a+b)/2  +  (b-a) /2*cos ( (set*pi) /(order+1)); 


%  Entering  conditions  for  the  loop.  First  loop  is  the  chebyshev  polynomial 

j  =  i; 

max  loops  =  10;  %  Max  loops  for  Remez  function 

%  %  %  %  %  ratio_error  =2; 

o _ o 

O - o 

%  Remez  loop,  however  first  set  of  coefficients  are  chebyshev  coef f icients% 
%  Exit  on  these  conditions:  1)  Convergence  2)  Greater  than  9  iterations  % 
%  3)  If  we  have  an  exact  quadratic  to  approx..  % 


%%  while  (ratio_error  >  1.00000001  | |  ratio_error  <  0.9999999)  &&  j<max_loops 

while  j<max  loops 

%  Extract  set  of  initial  points  for  evaluation  (we'll  use  4th  column) 

%  Next,  evaluate  the  points  on  the  actual  function 
N_p  =  [xi ( 1 , 1 ) ;  xi  (1, 2) ;  xi(l,3);  xi(l,4)]; 

F  =  fun  (N  p)  ; 


%  Raise  xO,  xl,  x2,  x3,  to  the  respective  powers 
A  =  (xi ' ) . A (set) ; 

A(  : , 4)  =  eps ' ; 


g, 

o 

o, 

o 

o, 

o 


p 

p_ 


Find  Polynomial  Coefficients 


O, 

o 

o, 

o 

o, 

o 


=  A\F  ; 

track  =  [p_track,p]; 


%  1st  time  =  chebyshev 
%  Records  error 


coefficients 


O _ O, 

O - O 

%  Remove  err  term;  flip  coefficients  % 

o _ g, 

o - o 

pflip  =  f liplr (p ( 1 : end-1 )  '  )  ; 


poly_coeff  =  pflip; 


g, 

o 

g, 

o 


Calculate  Plot  Values 


g, 

o 

o, 

o 


y  apprx  =  polyval (pflip, x  pts) ;  %  evaluate  with  poly  coefficients 


o _ g, 

o - - 

%  Calculate  the  Errors,  break  loop  if 
%  1.  function  is  already  a  Quadratic 
%  2.  If  convergence  has  been  reached 

o _ 

o - 6 

errRemz  =  y  apprx  -  y  act; 

max  Err  =  max  (errRemz (2 : end-1 )) ;  %  Max  error  (exclude  ends) 

min  Err  =  min  (errRemz (2 : end-1 )) ;  %  Min  error  (exclude  ends) 

if  abs (max  Err) >abs (min  Err)  %  Set  the  return  value  of  error 

snd  Err  =  abs (max  Err) ; 

else 


o 

O, 

O 

O, 

O 

g, 

o 

o. 


snd  Err  =  abs (min  Err)  ; 

end 
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%  (3)  Exit  loop  if  function  ==  quadratic  (very  very  small  error) 
if  abs (max  Err)  <  2A-40  &&  abs (min  Err)  <  2A-40 
oscil  =  0; 

%  %  %  plot_cheby (x_pts, y_apprx, y_act, y_pfit, errRemz, errPfit) ; 

break;  %  if  exact  polynomial  is  found! ! ! 

end 

%  (1)  Exit  loop  on  convergence  (previous  error  equal  to  present) 
if  j  >4 

compl=p_track ( 4 , j ) ; 
comp2=p_track ( 4 , j -1 ) ; 
if  compl  ==  comp2 

plot_cheby (x_pts, y_apprx, y_act, y_pf it, errRemz, errPfit) ; 
break; 

end  %  if  compl  ==  comp2 
end  %  if  j>l 


%  Finding  zeroes  (Max  &  Min  of  error) 


err  der  = 
err  sign  = 
err  sign  = 
errZerl  = 
errZer2  = 
errZeros  = 


diff (errRemz)  ; 
sign (err_der)  ; 
diff (err  sign)  ; 
find(err  sign  ==  -2); 
find(err  sign  ==  2); 

[errZerl , errZer2  ]  ; 


%  Find  difference  between  adjacent 
%  points  and  determine  the  signs. 

%  Find  difference  between  signs 
%  Yields  either  2  or  -2  where  the 
%  original  function  changed  sign 
%  Matrix  of  where  sign  changed 


%  Exit  Remez  if  too  many  Oscillations  % 

%  Provide  Chebyshev  Coefficients.  % 

o _ o, 

O - o 

oscil  =  length  (errZeros); 
if  oscil>order 

fprintf ( ' .  ' ) 

warning (' Too  many  oscillations;  Chebyshev  Coefficients  provided.' 
break; 

end 


%  Use  max  errors  and  replace  x  values 


new  x2  =  find (errRemz  ==  max  Err);  %  Index  of  max  error  point 
new  x3  =  find (errRemz  ==  min  Err);  %  Index  of  min  error  point 
%  Make  sure  to  replace  into  the  correct  order  on  the  range 


new  x2  =  new  x2 (1) ;  % 

new  x3  =  new  x3(l);  % 

if  new  x2  >  new  x3 

xi(:,2)  =  a+new  x2/N* (b-a) ; 
xi(:,3)  =  a+new_x3/N* (b-a) ; 
elseif  new  x2  <  new  x3 

xi(:,2)  =  a+new_x3/N* (b-a) ; 
xi(:,3)  =  a+new_x2/N* (b-a) ; 
end  %  end  if  new  x2  >  new  x3  statement 


Incase  there  are  multiiple 
pick  the  first  element 


ratio  error  =  abs (max  Err)/abs(min  Err); 

ratio  err  track  =  [ratio  err  track, ratio  error]; 


O, 

o 


o,  o, 

o  o 

o,  o, 

o  o 

o,  o, 

o  o 

o,  o, 

o  o 

o,  o, 

o  o 

o,  o, 

o  o 


end 


%  Plot  actual  vs  the  approx  functions  % 

o _ o, 

0 - Q 

if  mod(j,3)==l  | |  j==max_loops 

plot_cheby (x_pts, y_apprx, y_act, y_pf it, errRemz, errPfit) ; 
figure 

plot (x_pts , errFuncP) 

end  %  end  if  mod(j,3)==l  | |  j==max_loops  statement 

%  %  trackj  =  [trackj,  j]; 

j-j+i; 

%while  loop 


%  %  %  format  long; 

%  %  %  ratio_err_track 
%  %  %  p_track 
%  %  %  trackj 
%  %  %  format  short; 
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A.2.1  Variable  Length  Approximation  Speed-Up  Algorithms 


The  following  files  are  the  programs  used  to  speed  up  the  segmentation.  6  are 
presented  here.  The  first  tile  is  the  file  that  is  used  for  segmentation.  The  others  are 
available  for  the  purpose  of  comparison.  Only  the  first  tile  is  complete,  the  other  tiles 
only  show  the  code  that  is  different  from  the  first  one  i.e.  the  middle  of  the  file  that 
searches  out  the  width  for  segmentation. 


a.  Hybrid  of  3  estimates,  average  and  thirds 

FILE :  varQuadApproxHyb3AvgThird . m 


function  [endpt, i, p, data_]  =  ... 

varQuadApproxHyb3AvgThird (x  pts , f 3der , est  max  len, fct, epsilon, indx) 


varQuadApproxHyb3AvgThird . m 

This  function  creates  a  2nd  Order  polynomial  approximation  of  a  given 
function  using  the  Remez  algorithm.  It  continues  to  calculate  Remez 
approximations  until  epsilon  is  exceeded. 

Remez  approximations  (with  first  approximation  being  a  chebychev 
polynomial  approximation) . 

To  reduce  the  loop  time,  we  first  approximate  the  length  of  the 
proposed  segment.  We  take  3  estimates,  at  the  beginning,  end  and 
middle.  Take  the  average  of  these  3.  Then  evaluate  all  the  points 
on  the  proposed  length  and  get  set  of  estimated  lengths. 

Take  the  average  of  all  these  estimates.  This  is  the  proposed  length 
to  be  used. 


INPUT: 


OUTPUT: 


fct : 


x_pts : 

indx : 
epsilon : 


endpt : 
i : 

p: 


function  entered  by  user  (want  to  approximate  this) 
However  this  function  cannot  be  a  constant,  f  must 
be  only  one  variable.  Must  use  the  variable  'x'. 

All  the  x-axis  points  on  which  to  evaluate  the 
function  . 

index  at  which  to  start  the  interval  of  x  values 
maximum  error  that  the  user  wants  to  limit  the 
approximated  function. 

end  point  of  the  segment 

Index  at  which  we  stopped  the  function  approximated 
coefficient  for  polynomial  approximation 

p(l)  is  the  xA2  coeff,  p(2)  is  the  x  coeff  and 
p(3)  is  the  constant  term  in  the  2nd  order  poly 


Modified  by  Njuguna  Macaria 
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%  Modified:  FEB  2,  2007 

S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


Last  Modified:  APR  1,  2007 


syms  x 

order 

errStop 

loopt 

data_ 

x  ptsRange 

start  interval 


2; 

0; 

1; 

[]  ; 

x_pts (end) -x_pts (1) ; 
x_pts ( indx) ; 


%  Set  the  order  of  the  polynomial 
%  To  to  see  if  we  exceeded  epsilon 
%  track  times  Remez  is  called 
%  Final  loop  count  accumulated 
%  Basically  (b-a) 

%  Start  of  this  segment  interval 


S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

oooooooooooooooooooooooo 


ESTIMATION 


S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

oooooooooooooooooooooooooooo 


S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

oooooooooooooooo 


Using  Average  after  3  Est 


S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooo 


abs_f3der  =  abs ( f 3der ( start_interval ) )  ; 
if  abs_f3der  ==  0 

len  =  round (. 086*length (x  pts));  %  Close,  but  ends  up  being  increased 

else 

x_rangel  =  4* (epsilon*3/abs_f 3der ) A ( 1 /3 ) ; 

lenl  =  round (x  rangel/(x  ptsRange) *length (x  pts)); 

if  lenl+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 

else 

abs_f 3der=  abs ( f 3der (x_pts ( indx+lenl ) ) ) ; 
if  abs_f3der  ==  0 

len  =  round (. 086*length (x_pts) ) ; 

else 

x_range2  =  4* (epsilon*3/abs_f 3der ) A ( 1 /3 ) ; 

len2  =  round(x  range2/(x  ptsRange) *length (x  pts)); 

len  mid  =  round (( lenl+len2 ) /4 ) ; 

abs  f3der=  abs(f3der(x  pts(indx+len  mid))); 

if  abs_f3der  ==  0 

len  =  round (. 086*length (x_pts) ) ; 

else 

x_range3  =  4* (epsilon*3/abs_f 3der ) A ( 1 /3 ) ; 

len3  =  round(x  range3/ (x  ptsRange) *length (x  pts)); 

len  =  round (( Ienl  +  len2  +  len3 ) /3 )  ; 

end 

end 

if  len+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 

end 

Der3Intr  =  f3der(x  pts ( indx : indx+len) ) ;  %  Get  third  derivatives 

AV3DER  =  mean (Der3Intr) ;  %  Average  them  all 

x  range  =  4 * (epsilon*3/abs (AV3DER) ) A ( 1 /3 ) ;  %  Get  new  X  range  value 

len  =  round (x  range/ (x  ptsRange) *length (x  pts));  %  Best  len 

if  len+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 

elseif  len  >  est  max  len*10  %  When  3rd  Derivative  is  small 

len  =  est  max  len; 

end 

end 

end 
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S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9-  T  HP  7\  TT?  9- 9- 9- 9- 9- 9- 9- S- 9- 9- 9- S- 9- 9- 9- S- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 

ooooooooooooooooooooooooooo  liU^nllj  ooooooooooooooooooooooooooo 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

interval  =  [start  interval, x  pts ( len+indx) ] ; 

[p, oscil , errP]  =  chebyRemz (fct, interval, order) ; 
max  Perr  =  errP; 


LOOK 


=  max  Perr/epsilon; 


if 


abs_f 3der  ==  0  | |  LOOK  <0.9  | |  LOOK  >  1.002 

o _ o 

O - o 

%  Find  a  good  place  to  start  indexing  % 

o _ o 

0 - Q 

if  abs_f3der  ==  0 

while  (max  Perr  >  epsilon)  &&  len  >  2 
len  =  ceil  (len/3); 

if  len+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 


break; 


end 

interval  = 

[p, oscil , errP]  = 
max  Perr  = 

loopt  = 

end  %  while  max  Perr 
incrementLen  =  len; 

else 


[start  interval, x  pts (indx+len) ] ; 
chebyRemz (fct, interval, order) ; 
errP; 
loopt  +  1; 

>  epsilon 


incrementLen  =  ceil ( len* . 05 ) ; 
end  %  if  abs  f3der  ==  0 


while  incrementLen  >  2 

incrementLen  =  ceil ( incrementLen/3 ) ; 
while  (max  Perr  <  epsilon)  &&  len  >  2 

len  =  len  +  incrementLen; 

if  len+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 
break; 


end 

interval  = 

[p, oscil , errP]  = 
max  Perr  = 

loopt  = 

end  %  while  max  Perr 


[start  interval, x  pts (indx+len) ] ; 
chebyRemz (fct, interval, order) ; 
errP; 
loopt  +1; 

>  epsilon 


incrementLen  =  ceil (incrementLen/3); 


while  (max  Perr  >  epsilon)  &&  len  >  2 


len 

interval 
[p, oscil , errP] 
max  Perr 
loopt 
if  incrementLen  <  2 
break; 

end 


len  -  incrementLen; 

[start  interval, x  pts (indx+len) ] ; 
chebyRemz (fct, interval,  order)  ; 
errP; 
loopt  +1; 
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end  %  max  Perr  >  epsilon 
end  %  end  while  incrementLen  >  2 
end  %  if 


S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-  PTNfPDTMT  9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9- 

OOOOOOOOOOOOOOOOOOOOOOOOOO  JT  -L  IN  IT  W  _L  1\|  _L  oooooooooooooooooooooooooo 

9'2'9'9'9'9'9'9'9'9'9'9'9'2'9'9'9'2'9'9'9'2'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9' 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


o, 

o 


O, 

o 


%  Step  from  indx  +  len 


o _ o 

O - o 

if  max  Perr  >  epsilon  %  Since  we  exceeded,  go  backwards 

i  =  indx+len;  %  Jump  to  the  estimated  length 

errStop  =  2*epsilon;  %  Increase  to  prevent  premature  stop 

while  i  <  length (x  pts) 

if  errStop  <  epsilon 

i  =  i+1;  %  This  was  the  point  evaluated  before 


endpt  =  x  pts(i);  %  the  decrement  at  the  end  of  this 

%  while  loop.  Restore  index  i  and  all 
%  associated  data. 

fid  =  f open (' CompareLoop . txt ',' a ') ; 
data_  =  [data_  loopt] ; 


Der3Intr  =  f3der(x  pts ( indx : indx+len) )  ; 
AV3DER  =  mean (Der3Intr) ; 


end 


fprintf (fid, ' \n%4d  %4d  len:  %5d  i:  %5d 

' avg : %10 . 5f  LOOK:  %8.6f  MORE',... 

i, loopt,  len,  i-indx,  AV3DER,  LOOK); 


fclose  (fid) 
return 

end 

loopt  = 

interval  = 

[p, oscil , errP]  = 
errStop  = 

i  = 


loopt  +  1; 

[ start_interval ,  x_pts(i)]; 
chebyRemz (fct, interval, order) ; 
errP; 

i  -1; 


else 

for  i=indx+len : length (x  pts)  %  Since  we  were  short,  go  forward 
%  First  time  thru,  skip  this  if  statement 

%  If  exceeded  the  max  error,  then  go  back  to  the  previous  endpoint 
if  errStop  >  epsilon 

i  =  i-2;  %  Get  back  to  within  Error 

endpt  =  x_pts (i) ; 

interval  =  [start  interval,  x  pts(i)]; 

[p, oscil , errP]  =  chebyRemz (fct, interval,  order)  ; 

fid  =  f open (' CompareLoop . txt a ') ; 

data_  =  [data_  loopt] ; 


Der3Intr  =  f3der(x  pts ( indx : indx+len) ) ; 
AV3DER  =  mean (Der3Intr) ; 


fprintf (fid, '\n%4d  %4d  len:  %5d  i:  %5d 

' avg : %10 . 5f  LOOK:  %8.6f  LESS',... 

i, loopt,  len,  i-indx,  AV3DER,  LOOK) ; 
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fclose  (fid) ; 
return 

end  %  if  error  >  max 
loopt  =  loopt  +  1; 

interval  =  [start  interval,  x  pts(i)]; 

[p, oscil , errP]  =  chebyRemz (fct, interval,  order)  ; 
errStop  =  errP*1.05;  %  reduces  the  iterations 

end 

end  %  max  Perr  >  epsilon .  %  for  i=indx+l : length (fct) 

fid  =  f open ( ' CompareLoop . txt '  ,  ' a ' )  ; 

data_  =  [data_  loopt]  ; 

fprintf ( f id, ' \n%4d  %4d',i,  loopt); 

fclose  (fid) ; 

endpt  =  x_pts (i) ; 

%  END  OF  FILE:  varQuadApproxHyb3AvgThird . m 


T 


b.  Binary  Search 


FILE:  varQuadApproxBinSearch .m 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 
9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'S-  DTM  QT7  ADPH 

OOOOOOOOOOOOOOOOOOOOOOOO  JD  ±  IN  oooooooooo 

2'9'9'9'2'9'9'9'2'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'9' 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

while  (max  Perr  >  epsilon)  &&  len  >  2 
len  =  round  (len/2); 

interval  =  [start  interval, x  pts ( indx+len) ] ; 

[p, oscil , errP]  =  chebyRemz (fct, interval, order) ; 
max  Perr  =  errP; 

loopt  =  loopt  +  1; 

end  %  while  max  Perr  >  epsilon 

incrementLen  =  len; 

while  incrementLen  >  2 

incrementLen  =  round ( incrementLen/2 ) ; 
while  (max  Perr  <  epsilon)  &&  len  >  1 

len  =  len  +  incrementLen; 

if  len+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 
break; 

end 

interval  =  [start  interval, x  pts (indx+len) ] 

[p, oscil , errP]  =  chebyRemz (fct, interval, order) ; 
max  Perr  =  errP; 

loopt  =  loopt  +1; 

end  %  while  max  Perr  >  epsilon 

incrementLen  =  round ( incrementLen/2 ) ; 

while  (max  Perr  >  epsilon)  &&  len  >  1 

len  =  len  -  incrementLen; 

interval  =  [start  interval, x  pts (indx+len) ] 

[p, oscil , errP]  =  chebyRemz (fct, interval, order) ; 
max  Perr  =  errP; 

loopt  =  loopt  +1; 

if  incrementLen  ==  1 
break; 

end 

end  %  max  Perr  >  epsilon 
end  %  end  while  incrementLen  >  2 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 
OOOOOOOOOOOOOOOOOOOOOOOOOO  IT  .L  IN  IT  W  _L  1M  J.  oooooooo 

S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


c. 


Thirds 


FILE:  varQuadApproxTHIRD . m 

9-9-9-9-9-9'9-9-9-9-9-9-9-9'9-9-9-9-9-9-9-9'9-9-9-9-9-9-9-9'9-9-9-9-9-9-9-9'9-9-9-9-9-9-9-9'9-9-9-9-9-9-9-9'9-9-9- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

9-  9-  9-  9-  9-  9-  9-  2-  9-  9-  9-  2-  9-  9-  9-  2-  9-  9-  9-  9-  9-  9-  2-  2-  9-  9-  THTDFlQ  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9- 

oooooooooooooooooooooooooo  1  niruio  oooooooooooo 

9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

while  (max  Perr  >  epsilon)  &&  len  >  2 
len  =  round  (len/3); 

if  len+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 
break; 

end 

interval  =  [start  interval, x  pts ( indx+len) ] ; 

[p, oscil , errP]  =  chebyRemz (fct, interval, order) ; 
max  Perr  =  errP; 

loopt  =  loopt  +  1; 

end  %  while  max  Perr  >  epsilon 

incrementLen  =  len; 

while  incrementLen  >  2 

incrementLen  =  round ( incrementLen/3 ) ; 
while  (max  Perr  <  epsilon)  &&  len  >  2 

len  =  len  +  incrementLen; 

if  len+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 
break; 

end 

interval  =  [start  interval, x  pts (indx+len) ] 

[p, oscil , errP]  =  chebyRemz (fct, interval,  order)  ; 
max  Perr  =  errP; 

loopt  =  loopt  +1; 

end  %  while  max  Perr  >  epsilon 

incrementLen  =  round ( incrementLen/3 ) ; 

while  (max  Perr  >  epsilon)  &&  len  >  2 

len  =  len  -  incrementLen; 

interval  =  [start  interval, x  pts (indx+len) ] 

[p, oscil , errP]  =  chebyRemz (fct, interval,  order)  ; 
max  Perr  =  errP; 

loopt  =  loopt  +1; 

if  incrementLen  <  3 
break; 

end 

end  %  max  Perr  >  epsilon 
end  %  end  while  incrementLen  >  2 

9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9-  PTMPDTMT  9- 9- 

oooooooooooooooooooooooooo  oooooooo 

9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


d. 


Ratios 


FILE:  varQuadApproxRatio .  m 


9'9'9'9'2'9'9'9'9'9'9'9'2'9'9'9'2'9'9'9'2'9'9'9'2'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'2'9'9'9'9'9'9' 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

oooooooooooooooooooooooooo  rvri  -L  -L  W  O  oooooooooooooooooooooooooooooo 

S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


len 


=  length (x  pts) -indx; 


max  Perr  =  100; 

LOOK  =  0 ; 

o _ o 

O - o 

%  Find  a  good  place  to  start  indexing  % 

O _ O, 

Q - Q 

while  (max  Perr  >  epsilon)  &&  len  >  2 

len  =  floor  (len/3) ; 

interval  =  [start  interval, x  pts ( indx+len) ] ; 

[p, oscil , errP]  =  chebyRemz (fct, interval, order) ; 

max  Perr  =  errP; 

loopt  =  loopt  +  1; 

end 

while  (max  Perr  <  epsilon)  &&  len  >  2 
len  =  ceil  (len*1.2); 

if  len+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 
break; 

end 

interval  =  [start  interval, x  pts ( indx+len) ] ; 

[p, oscil , errP]  =  chebyRemz (fct, interval, order) ; 
max  Perr  =  errP; 

loopt  =  loopt  +  1; 


end  %  max  Perr  >  epsilon 


while  (max  Perr  >  epsilon)  &&  len  >  2 
len  =  f loor ( len* . 95 ) ; 

interval  =  [start  interval, x  pts ( indx+len) ] ; 

[p, oscil , errP]  =  chebyRemz (fct, interval, order) ; 
max  Perr  =  errP; 

loopt  =  loopt  +1; 

end  %  max  Perr  >  epsilon 

while  (max  Perr  <  epsilon)  &&  len  >  2 
len  =  ceil  (len *1.01); 

if  len+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 
break; 

end 

interval  =  [start  interval, x  pts ( indx+len) ] ; 

[p, oscil , errP]  =  chebyRemz (fct, interval, order) ; 
max  Perr  =  errP; 

loopt  =  loopt  +1; 

end  %  while  max  Perr  >  epsilon 


while  (max  Perr  >  epsilon)  &&  len  >  2 
len  =  floor  (len*. 999); 
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if  len+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 
break; 


end 

interval  = 

[p, oscil , errP]  = 
max  Perr  = 

loopt  = 

end  %  while  max  Perr 
%  end  %  if 


[start  interval, x  pts (indx+len) ] ; 
chebyRemz (fct, interval,  order)  ; 
errP; 
loopt  +  1; 

>  epsilon 


interval  = 

[p, oscil , errP]  = 
max  Perr  = 

S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-  PTl\TPOTl\TT  9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9- 

OOOOOOOOOOOOOOOOOOOOOOOOOO  JrJ-IMirky-LIMX  oooooooooooooooooooooooooo 

9'9'S^9'9'9'S^9'9'9'9'9'2'9'9'9'S^9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9' 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


[start  interval, x  pts  ( len+indx) ] ; 
chebyRemz (fct, interval,  order)  ; 
errP; 


e.  1  estimate 


FILE:  varQuadApproxl  .m 


2'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'9'9'9' 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'S- 

oooooooooooooooooooooooo 


ESITMATION 


S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

oooooooooooooooooooooooooooo 


9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9-  Hoi  nrt  1  I?  of  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9-  9- 

OOOOOOOOOOOOOOOOOOOOOOOO*.  UO-Lliy  -L  DO  L  oooooooooooooooooooooooooooo 

9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


abs_f3der  =  abs ( f 3der ( start_interval ) ) ; 
if  abs_f3der  ==  0 

len  =  round (. 086*length (x  pts));  %  Close,  but  ends  up  being  increased 

else 

x_rangel  =  4  * (epsilon*3/abs_f 3der ) A ( 1/3 ) ; 

len  =  round (x  rangel/(x  ptsRange) *length (x  pts)); 

if  len+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 

end 

end 


interval  = 

[p, oscil , errP]  = 
max  Perr  = 

9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-  PTT\TPOTl\TT  9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9-9- 

OOOOOOOOOOOOOOOOOOOOOOOOOO  IT  -L  IN  I:  vJ  J.  IN  J.  oooooooooooooooooooooooooo 

9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


[start  interval, x  pts ( len+indx) ] ; 
chebyRemz (fct, interval,  order)  ; 
errP; 
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f.  2  estimates 


FILE:  varQuadApprox2 . m 


9'9'9'9'2'9'9'9'9'9'9'9'2'9'9'9'2'9'9'9'2'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'2'9'9'9'9'9'9' 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

OOOOOOOOOOOOOOOOOOOOOOOO  III  D  -L  -L  i+lrr  J.  J.  w  IN  oooooooooooooooooooooooooooo 

9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9-  TT  o  n  m  rr  9  T?  o  -f-  9- 9- 9- 9- 9- 9- S- 9- 9- 9- 9- 9- 9- 9- S- 9- S- 9- S- 9- 9- 9- S- 9- 9- 9- 9- 9- 

OOOOOOOOOOOOOOOOOOOOOOOO*.  UO-LIiy  Z.  DO  L  oooooooooooooooooooooooooooo 

S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


abs_f3der  =  abs ( f 3der ( start_interval ) )  ; 
if  abs_f3der  ==  0 

len  =  round (. 086*length (x  pts));  %  Close,  but  ends  up  being  increased 

else 

x_rangel  =  4* (epsilon*3/abs_f 3der ) A ( 1 /3 ) ; 

lenl  =  round (x  rangel/(x  ptsRange) *length (x  pts)); 

if  lenl+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 

else 

abs_f 3der=  abs ( f 3der (x_pts ( indx+lenl ) ) ) ; 
if  abs_f3der  ==  0 

len  =  est  max  len; 

else 

x_range2  =  4  * (epsilon*3/abs_f 3der ) A ( 1 /3 ) ; 

len2  =  round(x  range2/(x  ptsRange) *length (x  pts)); 

len  =  round (( lenl+len2 ) /2 ) ; 

end 

end 

if  len+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 

end 

end 


interval 
[p, oscil , errP] 
max  Perr 


[start  interval, x  pts ( len+indx)  ]  ; 
chebyRemz (fct, interval,  order)  ; 
errP; 


oooooooooooooooooooooooooo 

S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

oooooooooooooooooooooooooo 

9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9-9'9'9'9'9'9'9'9' 

oooooooooooooooooooooooooo 


9'9'9'9'9'9'9'9'9'9'9'9'9'9-9'9'9'9'9'9'9'9'9' 

ooooooooooooooooooooooo 

PINPOINT 

S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

ooooooooooooooooooooooo 


9-9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9' 

oooooooooooooooooooooooooo 

S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

oooooooooooooooooooooooooo 

S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

oooooooooooooooooooooooooo 
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g.  3  estimates 


FILE:  varQuadApprox3  .m 


5- S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 
ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

6- &-&-S-&-&-&-S-S-&-&-S-S-&-&-S-S-&-&-S-&-&-&-S-  tp  c  T  TTVTZX  T  T  ni\T  S-S-&-&-S-&-&-&-S-S-&-&-S-&-&-&-S-S-&-&-S-&-&-&-S-S-&-&- 

OOOOOOOOOOOOOOOOOOOOOOOO  HiD-L-L  i.v±rl  ±  J.  wIM  oooooooooooooooooooooooooooo 

9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- S' 9- 9- 9- 9- 9- 9- 9- 9- 9- 9-  TTc  -1  nrr  Q  ToF  3-  9-  $■  $■  3-  $■  3-  $■  $■  9-  $■  $■  $■  9-  3-  $■  $■  9-  $■  $■  $■  9-  $■  $■  $■  9-  $■  $■ 

OOOOOOOOOOOOOOOOOOOOOOOO*.  UO-Lliy  O  DO  L  oooooooooooooooooooooooooooo 

3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'S' 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 


abs_f3der  =  abs ( f 3der ( start_interval ) ) ; 
if  abs_f3der  ==  0 

len  =  round (. 086*length (x  pts));  %  Close,  but  ends  up  being  increased 

else 

x_rangel  =  4* (epsilon*3/abs_f 3der ) A ( 1 /3 ) ; 

lenl  =  round (x  rangel/ (x  ptsRange) *length (x  pts)); 

if  lenl+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 

else 


abs_f 3der=  abs ( f 3der (x_pts ( indx+lenl ) ) ) ; 
if  abs_f3der  ==  0 

len  =  est  max  len; 

else 


x  range2  = 
len2  = 

len  mid  = 
abs_f 3der= 
if  abs_f3der  ==  0 
len  =  est  max 


4* (epsilon*3/abs_f 3der) A (1/3)  ; 

round (x  range2/(x  ptsRange) *length (x  pts)) 

round ( ( lenl  +  len2  )  /4  )  ; 

abs(f3der(x  pts(indx+len  mid))); 


len  ; 


else 


x  range3 

len3 

len 

end 


4* (epsilon*3/abs_f 3der) A (1/3)  ; 

round (x  range3/ (x  ptsRange) *length (x  pts)); 

round ( ( Ienl  +  len2  +  len3 )  /3)  ; 


end 

if  len+indx  >  length (x  pts) 

len  =  length (x  pts)  -  indx; 

elseif  len  >  est  max  len*10  %  When  3rd  Derivative  is  small 

len  =  est  max  len; 

end 

end 

end 


interval  =  [start  interval, x  pts ( len+indx) ] ; 

[p, oscil , errP]  =  chebyRemz (fct, interval, order) ; 
max  Perr  =  errP; 

3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'S' 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'S'  PTNIPHTMT  9-9-9-S-9-9-9-S-9-9-9-S-9-9-9-S-9-9-9-S-9-S-9-S-9-S- 

OOOOOOOOOOOOOOOOOOOOOOOOOO  oooooooooooooooooooooooooo 

3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'3'S' 

ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 
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A.2.2  Non-Uniform  Quadratic  Approximation 

This  is  the  file  that  keeps  track  of  the  segments  computed  and  the  associated 
endpoints  and  coefficients.  The  data  is  sent  back  to  the  main  function, 
QuadAppxRemz.m.  From  we  call  varQuadApproxHyb3AvgThird.m  or  any  of  the  other 
varQuadApprox*  files  depending  on  which  one  we  want  to  use. 


FILE:  multipleQuadApprox ,m 


function  [endpt, indx, c2, cl, cO]  =  multipleQuadApprox (xpts, fct, epsilon) 

o,  9'9'9'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'2'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'9'C 
o  ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo'c 

%  This  function  will  produce  multiple  Quadratic-line  approximations  of  a 
%  given  function  to  within  the  bounds  of  max  error  provided. 

%  Created:  January,  2007 


INPUT: 


OUTPUT: 


fct:  function  entered  by  user  (want  to  approximate  this) 
However  this  function  cannot  be  a  constant,  f  must 
be  only  one  variable.  Must  use  the  variable  'x'. 
xpts:  All  the  x-axis  points  on  which  to  evaluate  the 
function . 

epsilon:  maximum  error  that  the  user  wants  to  limit  the 
approximated  function. 

endpt:  end  point  of  the  segment 
indx:  Array  of  all  the  index  endpoints 

c2 :  Array  of  the  xA2  polynomial  coefficients 

cl:  Array  of  the  x  polynomial  coefficients 

cO :  Array  of  the  constant  terms  in  the  2nd  order  poly 


Modified:  July  2,  2007 


syms  x 

format  compact 
i  =  1; 

seg_no  =  1; 

endpt  =  [  ] ; 
c2  =  [  ]  ; 

cl  =  []; 

cO  =  [  ]  ; 


%  Find  Max  length  Estimate.  Will  be  % 

%  used  if  third  derivative  =  0,  or  if  % 

%  it's  really  small  (NOT  YET  IMPLEMETED)  % 

o _ o 

O - o 

fct_vec  =  inline (vectorize (fct) ) ; 

abs_f3der  =  abs (diff (diff (diff (fct) ) ) ) ; 
abs  f3der  vec=  inline (vectorize (abs  f3der) ) ; 
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Absolute  Max  3rd  derivative 


f3der_pts  =  abs (abs_f 3der_vec (xpts) ) ; 
abs  f3der  max=  max(f3der  pts)  ; 
x_ptsRange  =  xpts (end) -xpts ( 1 )  ; 

xpts  min  seg  =  4* (epsilon*3/abs  f3der  max) A  (1/3);  %  smallest  seg  width 

min  seg  len  =  round(xpts  min  seg/x  ptsRange*length (xpts) ) ; 
xpts  avg  seg  =  4* (epsilon*3*x  ptsRange/... 

quadl (abs_f 3der_vec, xpts (1) , xpts (end) ) ) A (1/3) ; 
avg_seg  len  =  round (xpts_avg  seg/x  ptsRange*length (xpts) ) ; 
est  max  len  =  2*avg  seg  len  -  min  seg  len; 

%  If  the  fucntion  is  sqrt ( -log (x) ) ,  then  make  est  max  len  the  max  size. 

%  est  max  len  calculated  is  not  as  large  as  the  larger  segments  and  will 
%  slow  down  the  program  because  of  small  estimates ...  Therefore : 
if  fct  ==  sqrt ( -log (x) ) 

est  max  len  =  length (xpts ) ; 

end 

%  Sometimes  the  estimates  are  short.  To  prevent  this  from  affecting  the 
%  program. . .  est  max  len  is  increased  *  10 
%  est  max  len  =  10*est  max  len; 


O, 

o 

o, 

o 

o, 

o 

o, 

o 


Get  the  values  for  each  segment  and 
store  them  in  the  return  vectors 


O, 

o 

o, 

o 

o, 

o 

o, 

o 


indx(i)=  1;  %  To  include  the  first  element,  offset  length  by  1 

while  i  <  length (xpts) 

[endpt ( seg_no) , indx ( seg_no+l ) , polyCoef f ]  =  ... 

varQuadApproxHyb3AvgThird  (xpts, abs  f3der  vec, . . . 

est  max  len, fct  vec, epsilon, i) 

c2 (seg_no)  =  polyCoef f ( 1 ) ; 
cl (seg_no)  =  polyCoef f (2 ) ; 
c0(seg_no)  =  polyCoef f  ( 3 ) ; 
i  =  indx (seg  no+1); 

seg_no  =  seg_no  +  1; 

end 

fprintf ( ' \n\n******************End  of  Segmentation******************\n\n ' ) 

avg_seg_len 

min^seg  len 

est  max  len 

Seg  lengths  =  diff(indx) 


A.2.3  Uniform  Quadratic  Approximation 

FILE:  constantQuadApprox .m 

function  [endpt, indx, c2, cl, cO]  =  constantQuadApprox (x_pts, fct, constsegs) 

o, 

o 

%  This  function  produces  multiple  Quadratic  approximations  of  a 
%  given  function  to  within  the  bounds  of  the  number  of  segments  provided. 
%  Coefficients  calculated  by  Remez. 

o, 

o 

%  Created  by  Tom  Mack  for  linear  approximations,  using  polyfit 
%  Created:  June  4,  2006 

%  Modified  for  Quadratic  approximations  using  Remez  by  Njuguna  Macaria 
%  Modified:  July  11,  2006 

o, 

o 


syms  x 
order  =  2; 
indx(l)  =  1; 

for  i  =  1: constsegs 

indx(i+l)  =  round ( (length (x_pts) /constsegs) *i) ;  %  each  iteration  set 

size 

if  i==constsegs 

indx(i+l)  =  length (x  pts); 

end 

endpt (i)  =  x_pts ( indx ( i+1 ) ) ; 

interval  =  [x_pts ( indx ( i )),  endpt ( i )]  ; 

[p, oscil , errP]  =  chebyRemz (fct, interval,  order)  ; 
c2  ( i )  =  p  ( 1 )  ; 
cl  (i)  =  p  (2)  ; 
cO  (i)  =  p  (3)  ; 

=  i + 1  ; 


i 


A.2.4  Uniform  Quadratic  Approximation  with  Constraints 


FILE:  constQuadAppxWErr  .m 


function  [endpt, indx, c2, cl, cO]  =  constQuadAppxWErr (xpts, fct, epsilon) 

o,  S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 
o  ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo 

%  This  function  will  produce  multiple  Quadratic-line  approximations  of  a  % 
%  constant  size  of  a  given  function  to  within  the  bounds  of  the  max  error  % 
%  provided.  Coefficients  &  intercept  calculated  using  Chebychev  and  % 

%  algorithm.  % 


INPUT: 


OUTPUT: 


fct : 


x_pts : 

indx : 
epsilon : 


endpt : 
c2  : 
cl : 
cO : 


function  entered  by  user  (want  to  approximate  this) 
However  this  function  cannot  be  a  constant,  f  must 
be  only  one  variable.  Must  use  the  variable  'x'. 

All  the  x-axis  points  on  which  to  evaluate  the 
function  . 

index  at  which  to  start  the  interval  of  x  values 
maximum  error  that  the  user  wants  to  limit  the 
approximated  function. 

end  point  of  the  segment 

Coefficients  of  xA2  in  quadratic  polynomial 
Coefficients  of  x  in  quadratic  polynomial 
Constant  of  quadratic  polynomial 


Compute  #  of  seg 
Author:  Njuguna  Macaria 


syms  x 


Date:  5  July  2007 


Find  Min  length  Estimate.  Will  be 
the  limiting  length  for  uniform 
implmentation 


f ct^vec 
abs_f 3der 
ab  s_f 3  de r_ve  c 
f 3der_pts 
abs  f3der  max 
x  ptsRange 


inline (vectorize (fct) ) ; 
abs (diff (diff (diff (fct) ) ) ) ; 
inline (vectorize (abs  f3der) ) 
abs (abs_f 3der_vec (xpts) ) ; 
max ( f 3der_pts ) ; 
xpts (end) -xpts (1) ; 


vectorize  fct  (for  eval) 
symbolic  3rd  derivative 
vectorize  for  evaluation 
evaluate  to  form  vector 
abs  (Max  3rd  derivative) 
Find  length  of  x-domain 


xpts  min  seg  =  4* (epsilon*3/abs  f3der  max) A (1/3);  %  smallest  domain  len 
est  min  seglen  =  floor (xpts  min  seg/x  ptsRange*length (xpts) )  %in  index  pts 


%  Find  where  this  happens  in  the  domain 


IndxofMax  =  find(f3der  pts  ==  abs  f3der  max) ; %  Find  min  len  on  domain 

numof times  =  length ( IndxofMax) ;  %  How  many  are  there? 


Test  begin.  End  and  Midddle 


117 


est  max  len  =  length (xpts ) ; 

if  numoftimes  >  1 

i  =  IndxofMax ( 1 ) ; 

else 

i  =  IndxofMax; 

end 

IndMaxtmp  =  i; 


%  dummy  variable 
%  more  than  1  Max  point 
%  i  at  begin  of  est  seg 

%  default 

%  The  new  IndexofMax 


if  i  >  length (xpts)  -  est  min  seglen 
lenBegin  =  est  min  seglen; 
lenMid  =  est  min  seglen; 


%  Check  if  truncated 
%  segement  then  fix 
%  both  these  estimates 


oooooooooooooooooooooooooo 

%  Begin  with  the  index  of  the  highest  3rd  derivative 
[endpt, indx, p]  =  varQuadApproxHyb3AvgThird (  xpts, . . . 

ab  s_f  3  de  r_ve  c , 
est  max  len, 
fct_vec,  . . . 
epsilon, . . . 

i)  ; 


lenBegin 


=  indx  -  i; 


S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S- 

oooooooooooooooooooooo 


%  index 
i 

if  i  < 
i  = 

end 

[endpt. 


of  the  highest  3rd  derivative  in  the  middle 

=  IndMaxtmp  -  floor (est  min  seglen/2);%  is  at  end  of  est  seg 
1  %  Check  to  make  sure  not  indexing  before  begin 

1;  %  of  interval,  if  so  start  at  begin  of  interval 


lenMid 


indx,p]  =  varQuadApproxHyb3AvgThird (  xpts, . . . 

ab  s_f 3  de r_ve  c , 
est  max  len, 
fct_vec,  . . . 
epsilon, . . . 

i)  ; 

=  indx  -  i; 


%  end  wi 
i 

if  i  <  1 
i  = 

end 

[endpt, i 


th  the  index  of  the  highest  3rd  derivative 

=  IndMaxtmp  -  est  min  seglen;  %  is  at  end  of  est  seg 

%  Check  to  make  sure  not  indexing  before  begin 
1;  %  of  interval,  if  so  start  at  begin  of  interval 


ndx,p]  =  varQuadApproxHyb3AvgThird (  xpts,... 

ab  s_f  3  de  r_ve  c , 
est  max  len, 
fct_vec,  . . . 
epsilon, . . . 


lenEnd 


=  indx 


o\°  o\°  o\°  D  g  o\°  o\° 


A.2.5  Fixed-Point  Decimal  to  HEXADECIMAL  or  BINARY 


FILE:  twosComp.m 


%function  [hexX, decX, binX]  =  twosComp (x, intLen, mantisaLen) 
function  hexX  =  twosComp (x, intLen, mantisaLen) 

,  O,  O,  c 
)  O  O  1 

twosComp .m 


This  function  converts  any  decimal  number  to 
fi  object. 


two ' s  complement  binary 


function  [hex,  decX,  binX]  =  twosComp (x, intLen, mantisaLen) 


Input : 


x : 

intLen : 


mantisaLen : 
Output:  decX: 

binX : 

hexX : 


The  value  to  be  converted 

User  desired  length  of  the  integer  portion  of 
the  number.  How  many  bits  are  in  the  integer. 
The  length  of  the  mantissa.  The  number  of  bits 
in  the  fraction  section,  the  precision. 

Decimal  value  as  fi  object.  Integer  and 
fraction  as  decimal  representation. 

Two's  Complement  of  the  input  x.  With  integer 
portion  represented  with  "intLen"  bits  and  the 
fraction  portion  represented  with  "mantisaLen" 
bits  . 

Two's  Complement  of  the  input  x.  Represented 
as  a  Hexadecimal  value. 


This  function  auto-aligns  the  decimal  point. 


Created  by: 
Date : 


Njuguna  Macaria 
10  May  2007 


totalLen  =  intLen+mantisaLen;  %  Total  bits  desired  to  represent  the  nbr. 
if  totalLen  >128 

warning ('Max  Precision:  128bits.  You  have  requested  >  128  bits'); 

end 


o, _ o, 

Q -  o 

%  fi  Object:  two's  complement  % 

o, _ o, 

O -  o 

decX  =  fi (x, 1 , totalLen, mantisaLen) ;  %  Create  fi  object,  display  decimal 

binX  =  bin (decX);  %  Save  and  return  a  binary  form 

hexX  =  hex (decX); 

deciM  =  dec (decX) ; 

o, _ o, 

O -  o 

%  Quantizer:  two's  complement  % 

o, _ o 

Q -  o 

%  %  %  q  =  quantizer (' fixed ' ,  'nearest',  'saturate', [totalLen  mantisaLen]) 

%  %  %  [a,b]  =  range (q) 

%  %  %  binX  =  num2bin(q,x) 

%  %  %  decX  =  bin2num(q,b) 
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A.2.6  User  Interface  and  Function  Information  Files 

FILE:  Userlnput.m 
function  select  =  UserlnputO 


format  long  g 

fprintf ( ' \n\n '  ) 
fprintf ('************************************************************** 
fprintf ( ' \n\n '  ) 
fprintf ('\n  QUADRATIC  APPROXIMATION  OF  A  FUNCTION  USING  CHEBYSHEV') 
fprintf ('\n  AND  REMEZ  A1GORITHM '  ) 
fprintf ('\n'  ) 
fprintf ('\n'  ) 


disp  ( 


disp  ( 

'Functions  to  be  compared 

Interval ' 

disp  ( 

1 . 

2  Ax 

[0,1]  ' 

disp  ( 

2  . 

1/x 

[1,2]  ' 

disp  ( 

3. 

sqrt (x) 

[1,2]  ' 

disp  ( 

4  . 

1 / sqrt (x) 

[1,2]  ' 

disp  ( 

5. 

log2 (x) 

[1,2]  ' 

disp  ( 

6. 

log  (x)  =  In  (x) 

[1,2]  ' 

disp  ( 

7  . 

sin (pi*x) 

[0, 1/2]  ' 

disp  ( 

8. 

cos (pi*x) 

[0, 1/2]  ' 

disp  ( 

9. 

tan (pi*x) 

[0,1/4]  ' 

disp  ( 

10. 

sqrt ( -log (x) )  =  sqrt (-In (x) ) 

[1/512,1/4]  ' 

disp  ( 

11 . 

tan (pi*x) A2  +  1 

[0, 1/4]  ' 

disp  ( 

12  . 

- (x*log2 (x)  +  ( 1-x) *log2 ( 1-x) ) 

[1/256, 1-1/256] 

disp  ( 

13. 

1/ (1+exp (-x) )  =  l/(l+eA(-x)) 

[0,1]  ' 

disp  ( 

14  . 

(1/ sqrt (2*pi) ) *exp (-xA 2/2) 

[0, sqrt  (2) ]  ' 

disp  ( 

15. 

sin (exp (x) ) 

[0,2]  1 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 

) 


disp  ( 


%  Get  FUNCTION  to  be  approximated  (user  input) 

select  =  input (  'Input  the  Function,  func [sqrt (-l*log (x) ) ]  :  '); 

if  isempty (select) 
select  =  10; 

end 


default 


FILE:  getF.m 

function  [func, interval 

, vari  or  const, err  or  segs, consegs,  epsilon,  N]  =  ..  . 

getF(fnc  choice) 

syms  x 

interval  =  ' [1/256, 

1/4]';  %% 

default 

err  or  segs  =  0; 

o,  o, 
o  o 

default 

consegs  =  200; 

Q,  O, 

O  O 

default 

epsilon  =  0.0001; 

O,  O, 

O  O 

default 

switch  fnc  choice 
case  1 

func  = 

'  2  A  x  '  ; 

interval  = 

'  [0,1]  '; 

case  2 

func  = 

'  1 .  /x  '  ; 

interval  = 

'  [1,2]  '  ; 

case  3 

func  = 

'  sqrt (x)  '  ; 

interval  = 

'  [1,2]  '; 

case  4 

func  = 

' 1 / sqrt  (x)  '  ; 

interval  = 

'  [1,2]  '; 

case  5 

func  = 

'  log2 (x)  ' ; 

interval  = 

'  [1,2]  '; 

case  6 

func  = 

' log (x)  '  ; 

interval  = 

'  [1,2]  '; 

case  7 

func  = 

' sin (pi*x)  '  ; 

interval  = 

'  [0,1/2]  '; 

case  8 

func  = 

'  cos (pi*x)  ' ; 

interval  = 

'  [0,1/2]  '; 

case  9 

func  = 

'  tan (pi*x)  ' ; 

interval  = 

'  [0, 1/4]  '  ; 

case  10 

func  = 

' sqrt ( -log  (x)  )  '  ; 

interval  = 

'  [1/512,1/4]  '  ; 

case  11 

func  = 

' tan (pi*x)  . A2  +  1  '  ; 

interval  = 

' [0,1/4] '; 

case  12 

func  = 

' -(x*log2(x)  +  ( 1 — x ) 

*log2 ( 1-x) ) ' ; 

interval  = 

' [1/256, 1-1/256] ' ; 

case  13 

func  = 

'  1/ (1+exp (-x) )  '  ; 

interval  = 

' [0,1] ' ; 

case  14 

func  = 

' (1/ sqrt (2*pi) ) *exp ( 

-xA2 / 2 ) ' ; 

interval  = 

'  [0, sqrt  (2)  ]  '; 

case  15 

func  = 

'  sin (exp (x) )  '  ; 

interval  =  '[0,2]'; 
end  %switch  fnc  choice 


%  Get  CONSTANT  OF  VARIABLE  segmentation  (User  input) 
vari_or_const  =  0; 

while  vari  or  const  ~=  1  &&  vari  or  const  ~=  2  &&  vari  or  const  ~=  3 

vari  or  const  =  input (  ' ( 1 ) Non-uniform  (2) Uniform  Segmentation  [1]:  '); 

if  isempty(vari  or  const) 


vari  or  const  =  1; 


default  Non-uniform 


%  If  non-uniform  segmentation,  then  enter  ERROR  parameters 
if  vari  or  const  ~=  2 

epsilon  =  input (  'Input  the  Desired  Error,  epsilon [2A-33] : 
if  isempty (epsilon) 


epsilon 


=  2  A  -  3  3 ; 


default 


%  If  uniform  segmentation,  find  how  the  user  will  restrict  #  of  segments 
if  vari  or  const  ==  2 

err  or  segs  =  input (  'Constrain  by  (1) Number  of  Segments  or  (2) Error  [1] 
if  isempty (err  or  segs) 

err_or_segs  =1;  %%  default 

end 

if  err  or  segs  ==  1 

consegs  =  input (  'Input  the  number  of  Desired  Segments [20] :  '); 

if  isempty (consegs) 

consegs  =  20;  %%  default 

end 

end 

if  err  or  segs  ==  2 

epsilon  =  input (  'Input  the  given  error;  epsilon [2 A-16] :  '); 

if  isempty (epsilon) 

epsilon  =  2A-16;  %%  default 


N  =  input (  'Input  the  no.  of  pts  the  fct  is  to  be  evaluated,  N[1000000] 
if  isempty (N) 

N  =  1000000;  %%  default 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 
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APPENDIX  B.  HDL  CODE 


B.l  MULTIPLIER  CODE 

The  VHDL  code  was  adapted  from  Xilinx’s  application  note  on  pipelining  a 
multiplier  in  the  Virtex  II  family  of  chips[22].  The  code  is  for  32  bit  inputs  and  one  32 
bit  product  with  the  decimal  point  in  the  middle;  16  bit  integer  and  16  bit  fraction. 


1.  VHDL 


--  School:  NPS  -  Naval  Postgraduate  School,  Monterey 

--  Student:  Njuguna  Macaria 


--  Create  Date: 

--  Design  Name: 

--  Module  Name: 

--  Project  Name: 
--  Target  Device: 
--  Tool  versions: 
--  Simulation: 

--  Description: 


14:10:56  07/07/07 

mult  32to32  -  Behavioral 

xc2v6000-4f f 1517  (virtex  II  in  SRC-6) 
Xilinx  6.303i  and  Synplicity  8.1 
Modelsim  and  Synplicity' s  simulation  tool 


--  Dependencies:  Modified  from 

--  Revision: 

--  Revision  0.01  -  File  Created 
--  Additional  Comments: 


— *********************  COMPONENTS  NEEDED  ***********************-- 


—  UNSIGNED  16  BIT  MULTIPLIER  — 


library  ieee; 

use  ieee . std_logic_l 1 64 . all ; 

Library  UNISIM; 

use  UNISIM. vcomponents . all; 

--  Entity:  Description  of  pins  (PORTS) 
entity  multl6  32  is 

port (  au,  bu :  in  std_logic_vector  (15  downto  0) ; 
elk  :  in  std  logic; 

produ  :  out  std_logic_vector (31  downto  0) ) ; 
end  multi 6  32; 

architecture  multl6  32  beh  of  multl6  32  is 
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component  FDR 


port  ( 


end 


Q 

out 

STD 

ULOGIC; 

D 

in 

std’ 

ULOGIC; 

C 

in 

STD 

ULOGIC; 

R 

in 

std’ 

ULOGIC) ; 

component; 


component  MULT18X18S 


(A 

in 

STD 

LOGIC  VECTOR 

(17 

downto 

0 

B 

in 

STD 

LOGIC  VECTOR 

(17 

downto 

0 

C 

in 

STD 

ULOGIC  ; 

CE 

in 

STD 

ULOGIC  ; 

P 

out 

STD 

LOGIC  VECTOR 

(35 

downto 

0 

R 

in 

STD 

ULOGIC  ) ; 

end  component; 


signal  a 

wire. 

b 

wire:  std  logic 

vector (15 

downto 

signal 

P_ 

wire:  std  logic 

vector (31 

downto 

signal 

discard:  std  logic 

vector (  3 

downto 

attribute 

RLOC 

:  string; 

attribute 

RLOC 

of 

REG 

AO 

label 

is 

"XOYO" 

r 

attribute 

RLOC 

of 

REG 

A1 

label 

is 

"XOYO" 

r 

attribute 

RLOC 

of 

REG 

A2 

label 

is 

"X0Y1 " 

r 

attribute 

RLOC 

of 

REG 

A3 

label 

is 

"X0Y1 " 

r 

attribute 

RLOC 

of 

REG 

A4 

label 

is 

"X0Y2 " 

r 

attribute 

RLOC 

of 

REG 

~A5 

label 

is 

"X0Y2 " 

r 

attribute 

RLOC 

of 

REG 

A6 

label 

is 

"X0Y3" 

r 

attribute 

RLOC 

of 

REG 

~A7 

label 

is 

"X0Y3" 

r 

attribute 

RLOC 

of 

REG 

A8 

label 

is 

"X0Y4 " 

r 

attribute 

RLOC 

of 

REG 

A9 

label 

is 

"X0Y4 " 

r 

attribute 

RLOC 

of 

REG 

A10 

label 

is 

"X0Y5" 

r 

attribute 

RLOC 

of 

REG 

All 

label 

is 

"X0Y5" 

r 

attribute 

RLOC 

of 

REG 

A12 

label 

is 

"X0Y6" 

r 

attribute 

RLOC 

of 

REG 

A13 

label 

is 

"X0Y6" 

r 

attribute 

RLOC 

of 

REG 

A14 

label 

is 

"X0Y7 " 

r 

attribute 

RLOC 

of 

REG 

A15 

label 

is 

"X0Y7 " 

r 

attribute  RLOC 

of  REG  A16:  label 

is  "X-1Y7"; 

attribute  RLOC 

of  REG  A17:  label 

is  "X-1Y7"; 

attribute 

RLOC 

of 

REG 

BO 

label 

is 

"X2Y0" 

r 

attribute 

RLOC 

of 

REG 

B1 

label 

is 

"X2Y0" 

r 

attribute 

RLOC 

of 

REG 

B2 

label 

is 

"X2Y1" 

r 

attribute 

RLOC 

of 

REG 

_B3 

label 

is 

"X2Y1" 

r 

attribute 

RLOC 

of 

REG 

B4 

label 

is 

"X2Y2 " 

r 

attribute 

RLOC 

of 

REG 

~B5 

label 

is 

"X2Y2 " 

r 

attribute 

RLOC 

of 

REG 

B6 

label 

is 

"X2Y3" 

r 

attribute 

RLOC 

of 

REG 

B7 

label 

is 

"X2Y3" 

r 

attribute 

RLOC 

of 

REG 

B8 

label 

is 

"X2Y4 " 

r 

attribute 

RLOC 

of 

REG 

B9 

label 

is 

"X2Y4 " 

r 

attribute 

RLOC 

of 

REG 

BIO 

label 

is 

"X2Y5" 

r 

attribute 

RLOC 

of 

REG 

Bll 

label 

is 

"X2Y5" 

r 

attribute 

RLOC 

of 

REG 

B12 

label 

is 

"X2Y6" 

r 
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attribute 

RLOC 

of 

REG  B13 

label  is 

"X2Y6" 

f 

attribute 

RLOC 

of 

REG  B14 

label  is 

"X2Y7 " 

r 

attribute 

RLOC 

of 

REG  B15 

label  is 

"X2Y7 " 

r 

attribute  RLOC 

of  REG 

B16:  label 

is  "X- 

1 Y  6  " ; 

attribute  RLOC 

of  REG 

B17 :  label 

is  "X- 

1 Y  6  " ; 

attribute 

RLOC 

of 

REG  PO 

label  is 

"X-2Y0 

M  . 
r 

attribute 

RLOC 

of 

REG  PI 

label  is 

"XI YO" 

r 

attribute 

RLOC 

of 

REG  P2 

label  is 

"X1Y0" 

r 

attribute 

RLOC 

of 

REG  P3 

label  is 

"X1Y1" 

r 

attribute 

RLOC 

of 

REG  P4 

label  is 

"X1Y1" 

r 

attribute 

RLOC 

of 

REG  P5 

label  is 

"X3Y0" 

r 

attribute 

RLOC 

of 

REG  P6 

label  is 

"X3Y0" 

r 

attribute 

RLOC 

of 

REG  P7 

label  is 

"X3Y1 " 

r 

attribute 

RLOC 

of 

REG  P8 

label  is 

"X-2Y2 

u  . 
r 

attribute 

RLOC 

of 

REG  P9 

label  is 

"X1Y2" 

r 

attribute 

RLOC 

of 

REG  P10 

label  is 

"X1Y2" 

r 

attribute 

RLOC 

of 

REG  Pll 

label  is 

"X1Y3 " 

r 

attribute 

RLOC 

of 

REG  P12 

label  is 

"X1Y3" 

r 

attribute 

RLOC 

of 

REG  P13 

label  is 

"X3Y2 " 

r 

attribute 

RLOC 

of 

REG  P14 

label  is 

"X3Y2 " 

r 

attribute 

RLOC 

of 

REG  P15 

label  is 

"X3Y3" 

r 

attribute 

RLOC 

of 

REG  PI 6 

label  is 

"X-2Y4 

n  . 
r 

attribute 

RLOC 

of 

REG  P17 

label  is 

"X1Y4 " 

r 

attribute 

RLOC 

of 

REG  PI 8 

label  is 

"X1Y4 " 

r 

attribute 

RLOC 

of 

REG  PI 9 

label  is 

"X1Y5" 

r 

attribute 

RLOC 

of 

REG  P20 

label  is 

"X1Y5" 

r 

attribute 

RLOC 

of 

REG  P21 

label  is 

"X3Y4 " 

r 

attribute 

RLOC 

of 

REG  P22 

label  is 

"X3Y4 " 

r 

attribute 

RLOC 

of 

REG  P23 

label  is 

"X3Y5" 

r 

attribute 

RLOC 

of 

REG  P24 

label  is 

"X-2Y6 

M  . 
r 

attribute 

RLOC 

of 

REG  P25 

label  is 

"X1Y6" 

r 

attribute 

RLOC 

of 

REG  P26 

label  is 

"X1Y6" 

r 

attribute 

RLOC 

of 

REG  P27 

label  is 

"X1Y7 " 

r 

attribute 

RLOC 

of 

REG  P28 

label  is 

"X1Y7 " 

r 

attribute 

RLOC 

of 

REG  P29 

label  is 

"X3Y6" 

r 

attribute 

RLOC 

of 

REG  P30 

label  is 

"X3Y6" 

r 

attribute 

RLOC 

of 

REG  P31 

label  is 

"X3Y7 " 

r 

attribute  RLOC 

of  REG 

P32 :  label 

is  "X3Y1 "  ; 

attribute  RLOC 

of  REG 

P33:  label 

is  "X3Y3"  ; 

attribute  RLOC 

of  REG 

P34 :  label 

is  "X3Y5"  ; 

attribute  RLOC 

of  REG 

P35:  label 

is  "X3Y7 "  ; 

attribute 

BEL 

:  string; 

attribute 

BEL 

of 

REG  AO 

label  is  ' 

FFX"  ; 

attribute 

BEL 

of 

REG  A1 

label  is  ' 

FFY"  ; 

attribute 

BEL 

of 

REG  A2 

label  is  ' 

FFX"  ; 

attribute 

BEL 

of 

REG  A3 

label  is  ' 

FFY"  ; 

attribute 

BEL 

of 

REG  A4 

label  is  ' 

FFX"  ; 

attribute 

BEL 

of 

REG  A5 

label  is  ' 

FFY"  ; 

attribute 

BEL 

of 

REG  A6 

label  is  ' 

FFX"  ; 

attribute 

BEL 

of 

REG  A 7 

label  is  ' 

FFY"  ; 

attribute 

BEL 

of 

REG  A8 

label  is  ' 

FFX"  ; 

attribute 

BEL 

of 

REG  A9 

label  is  ' 

FFY"  ; 
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attribute 

BEL 

of 

REG 

A10 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

All 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

A12 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

A13 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

A14 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

A15 

label 

is 

"FFY 

attribute  BEL 

of  REG  i 

116:  label 

is  " 

attribute  BEL 

of  REG  1 

117:  label 

is  " 

attribute 

BEL 

of 

REG 

BO 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

B1 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

B2 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

B3 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

B4 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

B5 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

B6 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

B7 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

B8 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

B9 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

BIO 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

Bll 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

B12 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

B13 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

B14 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

B15 

label 

is 

"FFY 

attribute  BEL 

of  REG  I 

316:  label 

is  " 

attribute  BEL 

of  REG  I 

317:  label 

is  " 

attribute 

BEL 

of 

REG 

PO 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

PI 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

P2 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

P3 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

P4 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

P5 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

P6 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

P7 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

P8 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

P9 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

P10 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

Pll 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

P12 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

P13 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

P14 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

P15 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

PI  6 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

P17 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

PI  8 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

PI  9 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

P20 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

P21 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

P22 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

P23 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

P24 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

P25 

label 

is 

"FFX 

attribute 

BEL 

of 

REG 

P2  6 

label 

is 

"FFY 

attribute 

BEL 

of 

REG 

P27 

label 

is 

"FFX" 

r 

attribute 

BEL 

of 

REG 

>2  8 

label 

is 

"FFY" 

r 

attribute 

BEL 

of 

REG 

>2  9 

label 

is 

"FFX" 

r 

attribute 

BEL 

of 

REG 

>30 

label 

is 

"FFY" 

r 

attribute 

BEL 

of 

REG 

>31 

label 

is 

"FFX" 

r 

— 

attribute 

BEL 

of 

REG 

P32  : 

label 

is 

"FFY" 

— 

attribute 

BEL 

of 

REG 

>33: 

label 

is 

"FFY" 

— 

attribute 

BEL 

of 

REG 

>34  : 

label 

is 

"FFY" 

— 

attribute 

BEL 

of 

REG 

>35: 

label 

is 

"FFY"  ; 

begin 


=> 

REG  AO 

'0'); 

:  FDR 

port 

map  (Q 

=> 

a  wire ( 0 ) 

r 

c 

=> 

CLK, 

D  => 

au  ( 0 ) 

=> 

REG  A1 

'0'); 

:  FDR 

port 

map  (Q 

=> 

a  wire ( 1 ) 

r 

c 

=> 

CLK, 

D  => 

au  (1) 

=> 

REG  A2 

'  0  '  )  ; 

:  FDR 

port 

map  (Q 

=> 

a  wire (2 ) 

r 

c 

=> 

CLK, 

D  => 

au  (2 ) 

=> 

REG  A3 

'0'); 

:  FDR 

port 

map  (Q 

=> 

a  wire ( 3 ) 

r 

c 

=> 

CLK, 

D  => 

au  ( 3 ) 

=> 

REG  A4 

'0'); 

:  FDR 

port 

map  (Q 

=> 

a  wire ( 4 ) 

r 

c 

=> 

CLK, 

D  => 

au  ( 4 ) 

=> 

REG  A5 

'0'); 

:  FDR 

port 

map  (Q 

=> 

a  wire ( 5 ) 

r 

c 

=> 

CLK, 

D  => 

au  ( 5 ) 

=> 

REG  A6 

'0'); 

:  FDR 

port 

map  (Q 

=> 

a  wire (6) 

r 

c 

=> 

CLK, 

D  => 

au  ( 6) 

=> 

REG  A 7 

'0'); 

:  FDR 

port 

map  (Q 

=> 

a  wire ( 7 ) 

r 

c 

=> 

CLK, 

D  => 

au  (7) 

=> 

REG  A8 

'0'); 

:  FDR 

port 

map  (Q 

=> 

a  wire ( 8 ) 

r 

c 

=> 

CLK, 

D  => 

au  ( 8 ) 

=> 

REG  A9 
'  0  '  )  ; 

:  FDR 

port 

map  (Q 

=> 

a  wire (9) 

r 

c 

=> 

CLK, 

D  => 

au  (9) 

=> 

REG  A10 

'0'); 

:  FDR 

port 

map  (Q 

=> 

a  wire (10) 

r 

c 

=> 

CLK, 

D  => 

au  ( 10 

=> 

REG  All 

'0'); 

:  FDR 

port 

map  (Q 

=> 

a  wire (11) 

r 

c 

=> 

CLK, 

D  => 

au  ( 1 1 

=> 

REG  A12 

'0'); 

:  FDR 

port 

map  (Q 

=> 

a  wire ( 12 ) 

r 

c 

=> 

CLK, 

D  => 

au  ( 12 

=> 

REG  A13 

'0'); 

:  FDR 

port 

map  (Q 

=> 

a  wire (13) 

r 

c 

=> 

CLK, 

D  => 

au  ( 13 

=> 

REG  A14 
’O'); 

:  FDR 

port 

map  (Q 

=> 

a  wire ( 14 ) 

r 

c 

=> 

CLK, 

D  => 

au  (14 

=> 

REG  A15 

'0'); 

:  FDR 

port 

map  (Q 

=> 

a  wire (15) 

r 

c 

=> 

CLK, 

D  => 

au  ( 15 

,  1 

REG 
R  =>  ' 0 ' ) 

A16  : 

r 

FDR 

port 

map(Q  =>  a  wire (16) 

r 

C  = 

0>  CLK 

,  D  =: 

,  1 

REG 
R  =>  ' 0 ' ) 

A17  : 

r 

FDR 

port 

map(Q  =>  a  wire (17) 

! 

C  = 

=  >  CLK 

,  D  =: 

=> 

REG  BO 

•O'); 

:  FDR 

port 

map  (Q 

=> 

b  wire ( 0 ) 

r 

c 

=> 

CLK, 

D  => 

bu  ( 0 ) 

=> 

REG  B1 
'  0  '  )  ; 

:  FDR 

port 

map  (Q 

=> 

b  wire ( 1 ) 

r 

c 

=> 

CLK, 

D  => 

bu  ( 1 ) 

=> 

REG  B2 

•O'); 

:  FDR 

port 

map  (Q 

=> 

b  wire (2 ) 

r 

c 

=> 

CLK, 

D  => 

bu  (2 ) 

rii 


REG_B3 
=>  ' 0  '  )  ; 

REG_B4 
=>  ' 0  ' )  ; 

REG_B5 
=>  ' 0  ' )  ; 

REG_B6 
=>  ’O'); 

REG_B7 
=>  ' 0  ' )  ; 

REG_B8 
=>  ’O'); 

REG_B9 
=>  ’O'); 

REG_B10 
=>  ’O’); 

REG_B1 1 
=>  ’O'); 

REG_B12 
=>  ’O'); 

REG_B13 
=>  ’O'); 

REG_B14 
=>  '  0  '  )  ; 

REG_B15 
=>  ’O’); 

REG 

,  R  =>  ’O’) 
REG 

,  R  =>  ' 0 ' ) 

Multi  : 
port 
downto  0) , 


REG_P0 
R  =>  ' 0 ' ) ; 

REG_P1 
R  =>  ' 0 ' ) ; 

REG_P2 
R  =>  ' 0 ' ) ; 

REG_P3 
R  =>  ' 0 ' ) ; 

REG_P4 
R  =>  ' 0 ' ) ; 

REG_P5 
R  =>  ' 0 ' ) ; 

REG_P6 
R  =>  ' 0 ' ) ; 

REG  P7 


:  FDR 

port 

map  (Q 

=> 

b 

wire  ( 3 ) 

,  c 

=> 

CLK, 

D 

=> 

bu  ( 3 ) 

r 

R 

:  FDR 

port 

map  (Q 

=> 

b 

wire ( 4 ) 

,  c 

=> 

CLK, 

D 

=> 

bu  (4) 

r 

R 

:  FDR 

port 

map  (Q 

=> 

b 

wire  ( 5 ) 

,  c 

=> 

CLK, 

D 

=> 

bu  ( 5 ) 

r 

R 

:  FDR 

port 

map  (Q 

=> 

b 

wire (6) 

,  c 

=> 

CLK, 

D 

=> 

bu  ( 6) 

r 

R 

:  FDR 

port 

map  (Q 

=> 

b 

wire ( 7 ) 

,  c 

=> 

CLK, 

D 

=> 

bu  (7) 

r 

R 

:  FDR 

port 

map  (Q 

=> 

b 

wire ( 8 ) 

,  c 

=> 

CLK, 

D 

=> 

bu  ( 8 ) 

r 

R 

:  FDR 

port 

map  (Q 

=> 

b 

wire (9) 

,  c 

=> 

CLK, 

D 

=> 

bu  (9) 

r 

R 

:  FDR 

port 

map  (Q 

=> 

b 

wire (10) 

,  c 

=> 

CLK, 

D 

=> 

bu (10) 

r 

R 

:  FDR 

port 

map  (Q 

=> 

b 

wire (11) 

,  c 

=> 

CLK, 

D 

=> 

bu (11) 

r 

R 

:  FDR 

port 

map  (Q 

=> 

b 

wire ( 12 ) 

,  c 

=> 

CLK, 

D 

=> 

bu  (12) 

t 

R 

:  FDR 

port 

map  (Q 

=> 

b 

wire (13) 

,  c 

=> 

CLK, 

D 

=> 

bu  (13) 

r 

R 

:  FDR 

port 

map  (Q 

=> 

b 

wire ( 14 ) 

,  c 

=> 

CLK, 

D 

=> 

bu (14) 

r 

R 

:  FDR 

port 

map  (Q 

=> 

b 

wire (15) 

,  c 

=> 

CLK, 

D 

=> 

bu  (15) 

r 

R 

B16  : 

FDR 

port 

map 

(Q 

=>  b  wire (16) 

r 

C  = 

> 

CLK 

,  D  => 

? 

0  ' 

r 

B17  : 

FDR 

port 

map 

(Q 

=>  b  wire  ( 17  ) 

! 

C  = 

> 

CLK 

,  D  => 

! 

0  ' 

MULTI  8X1 8S 

map  (P  (31  downto  0)  =>  p  wire,  P  (35  downto  32)  =>  discard  (3 

A  (17  downto  16)  =>  "00",  A(15  downto  0)  =>  a  wire, 

B  (17  downto  16)  =>  "00",  B(15  downto  0)  =>  b  wire, 

C  =>  CLK, 

CE  =>  ' 1 ' , 

R  =>  ' 0 ' ) ; 


FDR 

port 

map  (Q 

=> 

produ ( 0 ) 

,  c 

=> 

CLK, 

D 

=> 

P_ 

wire (0) 

r 

FDR 

port 

map  (Q 

=> 

produ ( 1 ) 

,  c 

=> 

CLK, 

D 

=> 

P_ 

wire  ( 1 ) 

r 

FDR 

port 

map  (Q 

=> 

produ (2 ) 

,  c 

=> 

CLK, 

D 

=> 

P. 

wire (2 ) 

r 

FDR 

port 

map  (Q 

=> 

produ ( 3 ) 

,  c 

=> 

CLK, 

D 

=> 

P_ 

wire (3) 

r 

FDR 

port 

map  (Q 

=> 

produ ( 4 ) 

,  c 

=> 

CLK, 

D 

=> 

P_ 

wire (4 ) 

r 

FDR 

port 

map  (Q 

=> 

produ ( 5 ) 

,  c 

=> 

CLK, 

D 

=> 

P_ 

wire (5) 

r 

FDR 

port 

map  (Q 

=> 

produ (6) 

,  c 

=> 

CLK, 

D 

=> 

P_ 

wire (6) 

r 

FDR 

port 

map  (Q 

=> 

produ ( 7 ) 

,  c 

=> 

CLK, 

D 

=> 

P 

wire (7 ) 

r 
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R  =>  ' 0 ' ) ; 
REG  P8  : 

FDR 

port 

map(Q  => 

produ 

(8) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(8) 

r 

r  =>  ' o ' ) ; 
REG  P9  : 

FDR 

port 

map(Q  => 

produ 

(9) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(9) 

r 

R  =>  ' 0 ' ) ; 
REG  P10  : 

FDR 

port 

map(Q  => 

produ 

(10) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(10) 

r 

R  =>  ' 0 ' ) ; 
REG  Pll  : 

FDR 

port 

map(Q  => 

produ 

(11) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(11) 

r 

r  =>  ' o ' ) ; 
REG  P12  : 

FDR 

port 

map(Q  => 

produ 

(12) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(12) 

r 

R  =>  ' 0 ' ) ; 
REG  P13  : 

FDR 

port 

map(Q  => 

produ 

(13) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(13) 

r 

R  =>  ' 0 ' ) ; 
REG  P14  : 

FDR 

port 

map(Q  => 

produ 

(14) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(14) 

r 

R  =>  ' 0 ' ) ; 
REG  P15  : 

FDR 

port 

map(Q  => 

produ 

(15) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(15) 

r 

r  =>  ' o ' ) ; 

REG  PI 6  : 

FDR 

port 

map(Q  => 

produ 

(16) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(16) 

r 

R  =>  ' 0 ' ) ; 
REG  P17  : 

FDR 

port 

map(Q  => 

produ 

(17) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(17) 

r 

R  =>  ' 0 ' ) ; 
REG  PI 8  : 

FDR 

port 

map(Q  => 

produ 

(18) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(18) 

r 

R  =>  ' 0 ' ) ; 
REG  PI 9  : 

FDR 

port 

map(Q  => 

produ 

(19) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(19) 

r 

R  =>  ' 0 ' ) ; 
REG  P20  : 

FDR 

port 

map(Q  => 

produ 

(20) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(20) 

r 

r  =>  ' o ' ) ; 
REG  P21  : 

FDR 

port 

map(Q  => 

produ 

(21) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(21) 

r 

R  =>  ' 0 ' ) ; 
REG  P22  : 

FDR 

port 

map(Q  => 

produ 

(22) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(22) 

r 

R  =>  ' 0 ' ) ; 
REG  P23  : 

FDR 

port 

map(Q  => 

produ 

(23) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(23) 

r 

R  =>  ' 0 ' ) ; 
REG  P24  : 

FDR 

port 

map(Q  => 

produ 

(24) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(24) 

r 

R  =>  ' 0 ' ) ; 
REG  P25  : 

FDR 

port 

map(Q  => 

produ 

(25) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(25) 

r 

R  =>  ' 0 ' ) ; 
REG  P26  : 

FDR 

port 

map(Q  => 

produ 

(26) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(26) 

r 

R  =>  ' 0 ' ) ; 
REG  P27  : 

FDR 

port 

map(Q  => 

produ 

(27) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(27) 

r 

R  =>  ' 0 ' ) ; 
REG  P28  : 

FDR 

port 

map(Q  => 

produ 

(28) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(28) 

r 

R  =>  ' 0 ' ) ; 
REG  P29  : 

FDR 

port 

map(Q  => 

produ 

(29) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(29) 

r 

R  =>  ' 0 ' ) ; 
REG  P30  : 

FDR 

port 

map(Q  => 

produ 

(30) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(30) 

r 

r  =>  ' o ' ) ; 
REG  P31  : 

FDR 

port 

map(Q  => 

produ 

(31) 

r 

c 

=> 

CLK, 

D 

=> 

p  wire 

(31) 

r 

R  =>  ' 0 ' ) ; 

REG 

P32  : 

:  FDR 

port  map(Q  =>  discard ( 

3) 

r 

c  = 

>  CLK, 

D 

=  > 

p  wire ( 32 )  , 

REG 

R  => 

P33  : 

'  0  '  ) 

:  FDR 

r 

port  map(Q  =>  discard ( 

2) 

r 

c  = 

>  CLK, 

D 

=  > 

p  wire ( 33 )  , 

REG 

R  => 

P34  : 

'  0  '  ) 

:  FDR 

r 

port  map(Q  =>  discard ( 

1) 

r 

c  = 

>  CLK, 

D 

=  > 

p  wire ( 34 )  , 

R  => 

'  0  '  ) 

r 
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REG_P35  :  FDR  port  map(Q  =>  discard  (  0)  ,  C  =>  CLK,  D  => 

p_wire(35)  ,  R  =>  ' 0 ' )  ; 

end  mult!6  32  beh; 


—  32  BIT  MULTIPLIER  — 


library  IEEE; 

use  IEEE . STD_LOGIC_1164 .ALL; 
use  IEEE . STD_LOGIC_ARITH. ALL; 
use  IEEE . S T D_L0G I C_UN SIGNED . ALL; 

entity  mult_32to32  is 
PORT  ( 

a,  b  :  in  std_logic_vector  (31  downto  0)  ; 

elk  ;  in  std  logic; 

prod  :  out  std_logic_vector  (31  downto  0)  )  ; 

END  mult_32to32; 

architecture  structural  of  mult  32to32  is 


Declare  component;  Unsinged  16  bit  Multiplier 


component  multi 6_32 

port (  au,  bu :  in  std_logic_vector  (15  downto  0) ; 
elk  :  in  std  logic; 

produ  :  out  std_logic_vector (31  downto  0) ) ; 
end  component; 


--  Intemediate 


SIGNAL 

MOO 

SIGNAL 

M01 

SIGNAL 

Ml  0 

SIGNAL 

M02 

SIGNAL 

Mil 

SIGNAL 

M2  0 

signals  for  multiplier  stage 
:  std_logic_vector (31  downto 

:  std_logic_vector (31  downto 

:  std_logic_vector (31  downto 

:  std_logic_vector (31  downto 

:  std_logic_vector (31  downto 

:  std_logic_vector (31  downto 


0)  ; 
0)  ; 
0)  ; 
0)  ; 
0)  ; 
0)  ; 


—  Intermediate 
SIGNAL  A00  : 
SIGNAL  A01  : 
SIGNAL  A10  : 
SIGNAL  A02  : 
SIGNAL  All  : 
SIGNAL  A20  : 

--  Some  definiti 
SIGNAL  ae  : 
SIGNAL  be  : 


signals  for  Adding  stage 

std_logic_vector ( 33  downto  0) 
std_logic_vector ( 4 9  downto  0) 
std_logic_vector ( 4 9  downto  0) 
std_logic_vector ( 65  downto  0) 
std_logic_vector ( 65  downto  0) 
std_logic_vector ( 65  downto  0) 

ons  for  implementing  sign  extend 
std_logic_vector ( 15  downto  0)  ; 
std_logic_vector ( 15  downto  0)  ; 
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--  Signal  to  hold 
--  if  the  width  i 


value  (synplify  pro  will  not  work 
s  not  matched,  Xilinx  will) 


SIGNAL 

prdtl 

SIGNAL 

prdt2 

SIGNAL 

prdt3 

SIGNAL 

prdt4 

SIGNAL 

prdt5 

SIGNAL 

prdt6 

— SIGNAL 

b 

std_logic_vector ( 4 9  downto  0) 
std_logic_vector ( 65  downto  0) 
std_logic_vector ( 65  downto  0) 
std_logic_vector ( 65  downto  0) 
std_logic_vector ( 65  downto  0) 
std_logic_vector ( 65  downto  0) 
std  logic  vector (31  downto  0) 


BEGIN  the  32  bit  Multiplier 


BEGIN 


PROCESS (elk) 

VARIABLE  zer 

:  std  logic  vector (15 

downto 

0) 

•  = 

zeros 

VARIABLE  ones 

:  std  logic  vector (15 

downto 

0) 

•  = 

ones 

BEGIN 

IF  elk 'event  and  elk  =  ' 1 '  THEN 


IF 

(a  (15) 

ae  ( 15 

ELSE 

ae  ( 15 

END 

IF; 

IF 

(b (15) 

be  (15 

ELSE 

be  (15 

END 

IF; 

=  ' 1 ' ) THEN 
downto  0)  <=  ones; 

downto  0)  <=  zer; 

=  ' 1 ' ) THEN 
downto  0)  <=  ones; 

downto  0)  <=  zer; 


END  IF; 

END  PROCESS; 

--  Apply  the  Multiplies 
U00  :  multl6  32 


PORT 

MAP 

(au 

(15 

downto 

0) 

=> 

a 

(15 

downto 

0) 

bu 

(15 

downto 

0) 

=> 

b 

(15 

downto 

0) 

elk 

=> 

elk. 

produ (31 

\  . 

downto 

0) 

=> 

MOO 

(31 

downto 

0) 

multl6  32 

!  r 

PORT 

MAP 

(au 

(15 

downto 

0) 

=> 

a 

(15 

downto 

0) 

bu 

(15 

downto 

0) 

=> 

b 

(31 

downto 

16) 

elk 

=> 

elk. 

produ (31 

\  . 

downto 

0) 

=> 

M01 

(31 

downto 

0) 

multl6  32 

)  r 

PORT 

MAP 

(au 

(15 

downto 

0) 

=> 

a 

(31 

downto 

16) 

bu 

(15 

downto 

0) 

=> 

b 

(15 

downto 

0) 

X"0000"  ; 
X"FFFF" ; 
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elk 

=>  elk. 

produ(31  downto 

0) 

=>  M10  (31 

downto 

0) 

U02 

:  multl6 

)  r 

32 

PORT  MAP  (au 

(15  downto 

0) 

=>  a  (15 

downto 

0)  , 

bu 

(15  downto 

0) 

=>  be  (15 

downto 

0) , 

elk 

=>  elk. 

produ(31  downto 

0) 

=>  M02  (31 

downto 

0) 

Ull 

:  multl6 

)  f 

32 

PORT  MAP  (au 

(15  downto 

0) 

=>  a  (31 

downto 

16)  , 

bu 

(15  downto 

0) 

=>  b  (31 

downto 

16)  , 

elk 

=>  elk. 

produ(31  downto 

0) 

=>  Mil  (31 

downto 

0) 

U20 

:  multl6 

)  r 

32 

PORT  MAP  (au 

(15  downto 

0) 

=>  ae  (15 

downto 

0)  , 

bu 

(15  downto 

0) 

=>  b  (15 

downto 

0)  , 

elk 

=>  elk. 

produ(31  downto 
) ; 

0) 

=>  M20  (31 

downto 

0) 

--  shift  the  values 

appropriately  for  addition 

PROCESS (elk) 

BEGIN 

IF  elk' 

event  and  elk  = 

'  1 

'  then 

A00 (33 

downto 

32) 

<= 

"00"; 

A00  (31 

downto 

0) 

<= 

MOO (31  downto  0); 

A01  (49 

downto 

48) 

<= 

"00"; 

A01  (47 

downto 

16) 

<= 

M01 (31  downto  0)  ; 

A01  (15 

downto 

0) 

<= 

X  "  0  0  0  0  " ; 

A10  (49 

downto 

48) 

<= 

"00"; 

A10  (47 

downto 

16) 

<= 

Ml 0(31  downto  0); 

A10  (15 

downto 

0) 

<= 

X  "  0  0  0  0  " ; 

A02 (65 

downto 

64) 

<= 

"00"; 

A02 (63 

downto 

32) 

<= 

M02 (31  downto  0) ; 

A02  (31 

downto 

0) 

<= 

X"00000000 

M  . 
r 

All (65 

downto 

64) 

<= 

"00"; 

All (63 

downto 

32) 

<= 

Mil (31  downto  0)  ; 

All (31 

downto 

0) 

<= 

X"00000000 

M  . 
r 

A20 (65 

downto 

64) 

<= 

"00"; 

A20 (63 

downto 

32) 

<= 

M20(31  downto  0); 

A2  0  (31 

downto 

0) 

<= 

X"00000000 

H  . 
r 

END  if; 

END 

PROCESS; 

PROCESS (elk) 

BEGIN 

IF  elk 'event  and 

elk  =  ' 

1 ' 

then 
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prdtl  <=  unsigned (AOO )  +  unsigned (A01 )  +  unsigned (A10) ; 
prdt2  <=  unsigned (A02 )  +  unsigned (A1 1 )  +  unsigned (A20) ; 
prdt3  <=  unsigned (prdt2 )  +  unsigned (prdtl ) ; 

prod  <=  prdt3(47  downto  16); 

END  IF; 

END  PROCESS; 

END  structural; 


2.  Verilog 

//  $Id:  S_MULT_64T064_SRC6.v,  v  1.1  2007/06/25  18:20:29  pvg  Exp  $ 

// 

//  Copyright  2007  SRC  Computers,  Inc.  All  Rights  Reserved. 

// 

//  Manufactured  in  the  United  States  of  America. 

// 

//  SRC  Computers,  Inc. 

//  4240  N  Nevada  Avenue 
//  Colorado  Springs,  CO  80907 
//  (v)  (719)  262-0213 

//  (f)  (719)  262-0223 

// 

//  No  permission  has  been  granted  to  distribute  this  software 
//  without  the  express  permission  of  SRC  Computers,  Inc. 

// 

//  This  program  is  distributed  WITHOUT  ANY  WARRANTY  OF  ANY  KIND. 

// 


//  DESCRIPTION:  This  module  performs  64  bit  signed  integer 

multiplication 

//  and  provides  a  64  bit  result. 

//  This  module  instantiates  Xilinx  components. 

// - // 


//  This  file  was  modified  by  Njuguna  Macaria  to  make  a  64  bit  by  64  bit 
//  Multiplier  with  a  64  bit  result  that  is  shifted  to  the  appropriate 
//  decimal  point  for  a  32  bit  integer  and  32  bit  fraction. 


// 

// - // 

// - // 

//  32  BIT  MULTIPLIER  // 

// - // 


'timescale  lns/lns 

module  mult32  64s  (A,  B,  Q,  CLK,  CLR)  ; 
input  [31:0]  A; 
input  [31:0]  B; 
output  [63:0]  Q; 
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input  CLK  / 
input  CLR; 


synthesis  syn  noclockbuf=l  syn  maxfan=100000 


reg 

[63: 

:  0  ] 

Q; 

reg 

[31: 

:  0  ] 

AR; 

reg 

[31: 

:  0  ] 

BR; 

wire 

[35 

0] 

R0; 

wire 

[35 

0] 

Rl; 

wire 

[35 

0] 

R2  ; 

wire 

[35 

0] 

R3 ; 

reg 

[31 

0] 

R0 

R; 

reg 

[31 

0] 

Rl 

R; 

reg 

[31 

0] 

R2 

R; 

reg 

[31 

0] 

R3 

R; 

always  @  (posedge  CLK  or  posedge  CLR) 
begin 

if  (CLR)  begin 
AR  <=  0; 

BR  <=  0; 

end 

else  begin 

AR  <=  A; 

BR  <=  B; 

end 

end 

MULTI 8X1 8S  X0  ( 

.A  ( { 2 ' bO ,  AR [15:0] }) , 

.B  ( { 2 ' bO ,  BR[15 : 0] } ) , 

.C  (CLK), 

.R  (CLR), 

.CE  (l'bl), 

.P  (R0) 

)  ; 

MULTI 8X1 8S  XI  ( 

.A  ( {2 'bO,  AR[31 : 16]  } ) , 

.B  ( { 2 ' bO ,  BR[15 : 0]  } )  , 

.C  (CLK), 

.R  (CLR), 

.CE  (l'bl), 

•P  (Rl) 

)  ; 

MULTI 8X1 8S  X2  ( 

.A  ( { 2 ' bO ,  AR [15:0]  }) , 

.B  ( { 2 ' bO ,  BR[31 : 16]  } )  , 

.C  (CLK), 

.R  (CLR), 

.CE  (l'bl). 


MULTI 8X1 8S  X3  ( 

.A 

( {2 'bO,  AR[31 : 16]  } )  , 

.B 

( { 2 ' bO ,  BR[31 : 16]  } )  , 

.C 

(CLK)  , 

.R 

(CLR)  , 

.CE 

(l'bl)  , 

.  P 

) ; 

(R3) 

always 

begin 

@  (posedge  CLK  or  posedge 

CLR) 

if 

(CLR)  begin 

RO  R  <=  0; 

R1  R  <=  0; 

R2  R  <=  0; 

R3  R  <=  0; 

end 

else  begin 

RO  R  <=  RO; 

R1  R  <=  Rl; 

R2  R  <=  R2 ; 

R3  R  <=  R3 ; 

end 

end 

always 

begin 

@  (posedge  CLK  or  posedge 

CLR) 

if 

(CLR)  begin 

Q  <=  0; 

end 

else  begin 

//  add  and  shift 

Q  <=  R0  R  +  {Rl  R, 1 6 ' bO } 

+  { R2  R,  1 6  '  bO  }  +  { R3 

R,  32 'bO} 

end 

end 

endmodule 

//- 

//- 

// 

64  BIT  MULTIPLIER 

//- 

//- 

' timescale 

Ins/ Ins 

module  mult 

64s  (A,  B,  Q,  CLK,  CLR)  ; 

input 

[63:0]  A; 

input 

[63:0]  B; 

output 

[63:0]  Q; 

input 

CLK  /*  synthesis  syn  noclockbuf=l  syn  maxfam 

=100000 

input 

CLR; 

reg 

[127:0]  Q  R; 

reg 

[  63:0]  Q; 

reg 

[63:0]  AR; 

reg 

[63:0]  BR; 

wire 

[63:0]  R0; 

wire 

[63:0]  Rl; 

wire 

[63:0]  R2 ; 

wire 

[63:0]  R3 ; 

reg 

[63:0]  RO  R; 

reg 

[63:0]  R1_R; 

reg 

[63:0]  R2_R; 

reg 

[63:0]  R3  R; 

always 

begin 

@  (posedge  CLK  or  posedge  CLR) 

if 

(CLR)  begin 

AR  <=  0; 

BR  <=  0; 

end 

else  begin 

AR  <=  A; 

BR  <=  B; 

end 

end 

mult32 

64s  X0  ( 

.A 

(AR[31 : 0] ) , 

.B 

(BR[31 : 0] ) , 

•  Q 

(R0)  , 

.  CLK 

(CLK)  , 

.  CLR 

)  ; 

(CLR) 

mult32 

64s  XI  ( 

.A 

(AR [63:32]), 

.B 

(BR[31 : 0  ] ) , 

•  Q 

(Rl)  , 

.CLK 

(CLK)  , 

.CLR 

)  ; 

(CLR) 

mult32 

64s  X2  ( 

.A 

(AR[31 : 0] ) , 

.B 

(BR [63:32] ) , 

•  Q 

(R2 )  , 

.CLK 

(CLK)  , 

.CLR 

)  ; 

(CLR) 

mult32 

64s  X3  ( 

.A 

(AR [63:32]), 

.B  (BR [63:32]), 

•Q  (R3), 

. CLK  (CLK) , 

. CLR  (CLR) 

)  ; 


always  @  (posedge  CLK  or  posedge  CLR) 
begin 

if  (CLR)  begin 
R0_R  <=  0; 

R1_R  <=  0; 

R2_R  <=  0; 

R3_R  <=  0; 

end 

else  begin 

R0_R  <=  R0; 

R1_R  <=  Rl; 

R2_R  <=  R2 ; 

R3_R  <=  R3 ; 

end 

end 

always  @  (posedge  CLK  or  posedge  CLR) 
begin 

if  (CLR)  begin 

Q  <=  0; 

end 

else  begin 

//  add  and  shift 

Q_R  <=  R0_R  +  {R1_R, 32'bO}  +  {R2_R, 32'bO}  +  {R3_R, 64'bO} 
//  Only  take  64  bits  from  the  middle  for  a  32.32  number 
Q  <=  Q_R [ 9  5 : 32] ; 

end 

end 

endmodule 


Bj 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 
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APPENDIX  C.  SRC  C  CODE 


C.l  UNIFORM  SEGMENTATION 
1.  Floating  Point 
a.  Main.c 

#include<stdio . h> 

#include<stdlib . h> 

#include<strings . h> 

#include<libmap . h> 

//  Subroutine  initialization  in  Main 
void  subr_map (  double  acoef [] , 
int  ncoef, 

double  incre, 
double  offsetV, 
double  x  [  ] , 
double  y [ ] , 
double  ys [ ] , 
int  npts, 

int64  t  *timeO, 
int64  t  *timel, 
int  mapnum) ; 

//  MAIN 
main  ()  { 

//  Initialize  Variables 
FILE  *fpl ; 

double  *array,  *x,  *y,  *ys,  incre, val, offsetV; 

int  i, ir, nc, npts,  mapnum, nmap,  ncoef, arr  indx, inNum; 

int64  t  tmO,  tml; 

/ /  Start  NFG  and  select  map  number 
printf  ( " \n*  *  *  START  UP  THE  NFG  ***\n"); 
mapnum  =  0 ; 
nmap  =  1 ; 

//  !  allocate  map  to  this  problem 

map  allocate  (nmap) ; 


//  User  interface 

printf  (" - \n" )  ; 

printf ( "Function  1.  2Ax  :  l\n"); 

printf ( "Function  2.  1/x  :  2\n"); 

printf ( "Function  3.  sqrt (x)  :  3\n"); 

printf ( "Function  4.  l/sqrt(x)  :  4\n"); 

printf ( "Function  5.  log2 (x)  :  5\n"); 

printf ( "Function  6.  In (x)  :  6\n"); 

printf ( "Function  7.  sin(pi*x)  :  7\n"); 

printf ( "Function  8.  cos (pi*x)  :  8\n"); 

printf ( "Function  9.  tan(pi*x)  :  9\n"); 
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printf ( "Function  10.  sqrt(-ln(x))  :  10\n"); 

printf ( "Function  11.  tan(pi*x)A2  +  1  :  ll\n"); 

printf ( "Function  12.  - (x*log2 (x)  +  ( 1-x) *log2 ( 1-x) ) :  12\n"); 

printf ( "Function  13.  l/(l+eA(-x))  :  13\n"); 

printf ( "Function  14.  (1/sqrt (2*pi) ) *exp (-xA2/2)  :  14\n"); 

printf ( "Function  15.  sin(exp(x))  :  15\n"); 

printf  (" - \n"  )  ; 


printf (" \nSelect  which  function  to  implement:  "); 
scanf("%i",  SinNum) ; 

printf ("What  value  did  I  enter:  %i  \n  ",inNum); 

//  Open  the  Hex  data  file  to  read 
switch  (inNum) 

{ 

case  1:  fpl  =  fopen ("Data/memDl .mem", "r") ; 
break; 

case  2:  fpl  =  fopen ( "Data/memD2 . mem" , "r" ) ; 
break; 

case  3:  fpl  =  fopen ( "Data/memD3 .mem" , "r" ) ; 
break; 

case  4:  fpl  =  fopen ( "Data/memD4 .mem"  ,  "r" ) ; 
break; 

case  5:  fpl  =  fopen ( "Data/memD5 .mem" , "r" ) ; 
break; 

case  6:  fpl  =  fopen ( "Data/memD6 . mem" , "r" ) ; 
break; 

case  7:  fpl  =  fopen ( "Data/memD7 . mem" , "r" ) ; 
break; 

case  8:  fpl  =  fopen ( "Data/memD8 .mem" , "r" ) ; 
break; 

case  9:  fpl  =  fopen ( "Data/memD9 .mem" , "r" ) ; 
break; 

case  10:  fpl  =  fopen ( "Data/memDIO .mem", "r") ; 
break; 

case  11:  fpl  =  fopen ( "Data/memDl 1 .mem" , "r" ) ; 
break; 

case  12:  fpl  =  fopen ( "Data/memD12 .mem", "r") ; 
break; 

case  13:  fpl  =  fopen ( "Data/memD13 .mem", "r") ; 
break; 

case  14:  fpl  =  fopen ( "Data/memD14 . mem" , "r" ) ; 
break; 

default:  fpl  =  fopen ( "Data/memD15 .mem", "r") ; 
break; 

} 

printf  ("fpl  %i\n" , fpl ) ; 

//  Read  in  the  values  from  the  file 
fscanf  (fpl,  "%i",  Sncoef ) ; 
fscanf  (fpl,  "%lf",  Sincre) ; 
fscanf  (fpl,  "%lf",  SoffsetV) ; 


/ /  Depending  on  number  segments 
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//nc  =  50; 
//nc  =  600; 
//nc  =  1500; 
nc  =  35000; 


//  For  16  bit  accuracy 
//  For  23  bits 
/ /  For  32  bits 
//  For  40  bits 


array  =  (double* ) Cache_Aligned_Allocate  (4*nc*8); 

x  =  (double* ) Cache_Aligned_Allocate  (nc*8  ); 

y  =  (double* ) Cache_Aligned_Allocate  (nc*8  ); 

ys  =  (double* ) Cache_Aligned_Allocate  (nc*8  ); 

//  check  if  the  right  thing  was  read 

printf  ("  ncoef  %i\n" , ncoef ) ; 

//  read_file 

for  (i=0; i<ncoef ; i++)  { 

fscanf  (fpl,  "%lf",  &val) ; 
array[i*4]  =  val; 

fscanf  (fpl,  "%lf",  &val) ; 
array[i*4+l]  =  val; 

fscanf  (fpl,  "%lf",  &val) ; 
array[i*4+2]  =  val; 

fscanf  (fpl,  "%lf",  &val) ; 
array[i*4+3]  =  val; 

}  //  end  read__file 
f close ( fpl ) ; 

npts  =  30; 

/ /  create_samples 
for  (ir=0; ir<npts; ir++)  { 

arr  indx  =  ir  %  ncoef; 
x[ir]  =  array [arr  indx*4]; 

printf  ("ir  %3i  x  values  are:  %lf \n" , ir , x [ ir ] ) ; 

}  //end  create_samples 

printf  ("main  ncoef  %i  npts  %i\n" , ncoef , npts ) ; 

subr_map  (array,  ncoef,  incre,  offsetV,  x,  y,  ys,  npts,  &tm0,  &tml, 
mapnum) ; 

printf  ("\n************  BACK  FROM  MAP  **********\n" )  ; 
printf  ("%lld  clocks  for  NFG\n",  tmO); 
printf  ("%lld  clocks  for  SRC  Macro\n",  tml); 

for  (i=0; i<npts; i++)  { 

printf  ("x:  %5.161f  ysubr:  %5.161f  ySRCMacro:  %5.161f\n", 
x [ i  ]  ,  y [ i ] , ys  [  i ]  )  ; 


/ /  !  release  the  map  resources 

map  free  (nmap) ; 


b.  subr.mc 

#include  <libmap.h> 


void  subr  map 

double 

ac[]  , 

int 

ncoef. 

double 

incre. 

double 

of f setV 

double 

xc  [  ]  , 

double 

yc [] , 

double 

ys  [] , 

int 

npts , 

int64  t 

*time0 , 

int64  t 

*timel , 

int 

mapno) 

/'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

*  Declarations 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k  / 

OBM_BANK_A  (ysmap,  double,  MAX_OBM_SIZE) 

OBM_BANK_B  (a,  double,  MAX_OBM_SIZE) 

OBM_BANK_C  (b,  double,  MAX_OBM_SIZE) 

OBM_BANK_D  (c,  double,  MAX_OBM_SIZE) 

OBM_BANK_E  (x,  double,  MAX_OBM_SIZE) 

OBM_BANK_F  (y,  double,  MAX_OBM_SIZE) 

int  i,j,  nbytes,  indx; 

int64  t  tmO,tml; 

double  varx, indxtmp; 

/'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

*  Read  in  the  cooeff  and  segment  endpoints 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k  / 

nbytes  =  4*ncoef  *  8;  /*  4  data  values  (seg,a,b,c),  64bits  each  */ 

DMA_CPU  (CM20BM,  ysmap,  MAP_OBM_stripe ( 1 , "A, B, C, D" ) ,  ac,  1,  nbytes, 

0); 

wait  DMA  ( 0 ) ; 

/****************************************************************** 

*  Read  in  the  Number  of  points 

•k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k  J 

nbytes  =  npts  *  8; 

DMA^CPU  (CM20BM,  x,  MAP_OBM_stripe ( 1 , "E" ) ,  xc,  1,  nbytes,  0); 
wait  DMA  (0); 

J'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

*  Useful  in  Debug  Mode  to  determine  when  in  Map 
******************************************************************/ 

printf  ("\n\n************  NOW  IN  MAP  **********\n'')  ; 
printf  ("MAP  subr  ncoef  %i  npts  %i\n" , ncoef , npts) ; 

/-k-k-k-k-k'k-k'k-k-k-k-k-k-k-k-k-k'k-k-k-k'k-k'k-k'k-k'k-k'k-k-k-k'k-k-k-k-k-k'k-k-k-k-k'k-k-k-k-k'k-k-k-k'k-k-k-k-k-k-k'k-k-k-k-k-k 

*  Read  timer  and  use  a  constant  for  UNIFORM  Segmentation 
******************************************************************/ 

read  timer  (&tm0); 


144 


printf ( "incre :  %15.101f  offset:  %15.101f\n",  mere, of f setV) ; 
for  (i=0; i<npts; i++) 


varx  =  x  [  i ] ; 
indxtmp  =  incre  *  varx; 

indx  =  (int) ( indxtmp-of f setV) ;  //  For  interval  [a,b] 

y[i]  =  a [ indx] *varx*varx  +  varx*b[indx]  +  c[indx]; 

//  For  Debug  only 

printf ( "indxtmp :  %15.101f  indx:  %i  x:  %15.101f  a:  %15.101f 
indxtmp,  indx,  varx,  a[indx]); 

printf ( "b :  %15.101f  c:  %15.101f  fx:  %15.101f\n", 
b[indx],  c[indx],  y[i]); 


read_timer  (&tml); 

*timeO  =  tml-tmO; 

read_timer  (&tmO) ; 
if (ncoef  ==  4017) { 

for  (i=0;  i<npts;  i++) 

ysmap[i]  =  sqrt (-l*logf (x [i] ) ) ;  //  func  10 

//  ysmap[i]  =  cosf (x [i] *3 . 14159265358979)  ;  //  func  8 

} 

read_timer  (&tml); 

*timel  =  tml  -  tmO; 

*  Send  back  the  results 

'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k 

nbytes  =  npts  *  8; 

DMA_CPU  (OBM2CM,  y,  MAP_OBM_stripe (1 , "F" ) ,  yc,  1,  nbytes,  0); 
wait  DMA  (0); 


nbytes  =  npts  *  8; 

DMA_CPU  (OBM2CM,  ysmap,  MAP_OBM_stripe ( 1 , "A" ) ,  ys,  1,  nbytes, 
wait  DMA  (0) ; 


c. 


Sample  memory  file  (mem  D 13.  mem) 


23 

23.000000000000000000 

0.000000000000000000 

0.043477043477043474 

0.086956086956086961 

0.130434130434130440 

0.173912173912173920 

0.217390217390217380 

0.260869260869260880 

0.304347304347304340 

0.347825347825347850 

0.391303391303391310 

0.434782434782434780 

0.478260478260478240 

0.521738521738521750 

0.565216565216565270 

0.608695608695608680 

0.652173652173652200 

0.695651695651695600 

0.739129739129739120 

0.782608782608782640 

0.826086826086826040 

0.869564869564869560 

0.913042913042913070 

0.956521956521956480 

1.000000000000000000 


-0.001358317312431898 

-0.004069869528842258 

-0.006766102264296870 

-0.009436856238895420 

-0.012072268340317297 

-0.014662753872468844 

-0.017199021635054011 

-0.019672204841648999 

-0.022073970679148382 

-0.024396553731653503 

-0.026632698067803585 

-0.028775795055367724 

-0.030819956163696906 

-0.032759983918098333 

-0.034591349895399928 

-0.036310255047355668 

-0.037913669840421889 

-0.039399282695242094 

-0.040765458997357298 

-0.042011264516131290 

-0.043136460147426350 

-0.044141446445428945 

-0.045027205024233290 


0.250022142664697360 

0.250257856231724860 

0.250726624819223480 

0.251423133697143810 

0.252339521794440860 

0.253465478604624370 

0.254788348118468570 

0.256293304836314910 

0.257963583077645050 

0.259780689724019740 

0.261724551035193380 

0.263773815334481300 

0.265906159795873900 

0.268098508370670290 

0.270327244592241440 

0.272568518769947920 

0.274798561824121660 

0.276993876874879700 

0.279131424123494290 

0.281188891968362940 

0.283144934523894110 

0.284979312245647760 

0.286673001843708750 


0.499999946525873650 

0.499994717163829820 

0.499974235918064340 

0.499928719944731090 

0.499848954352031030 

0.499726502843754640 

0.499553907045252270 

0.499324864391809510 

0.499034376288904240 

0.498678874917245770 

0.498256341372862010 

0.497766372051921200 

0.497210208629803470 

0.496590760719450740 

0.495912606692870410 

0.495181941782176340 

0.494406487986867260 

0.493595416118664640 

0.492759249956049420 

0.491909715721972900 

0.491059574388106770 

0.490222473401396690 

0.489412798087615010 


2.  Fixed  Point 


a.  Main.c 


#include<stdio . h> 
#include<stdlib . h> 
#include<strings . h> 
#include<libmap . h> 
#include<math . h> 


//  Subroutine  init 
void  subr  map  (i 
i 
i 
i 
i 
i 
i 
i 
i 


ialization  in  M 

nt64 

t 

acoef [ ] , 

nt 

ncoef. 

nt64 

t 

incre. 

nt64 

t 

offsetV, 

nt64 

t 

x[]  , 

nt64 

t 

y[] , 

nt 

xpts , 

nt64 

t 

*time0 , 

nt 

mapnum) ; 

//  MAIN 
main  ()  { 


//  Init 

FILE 

int 

int 

int 

int64  t 
int64  t 


ialize  Variables 
*fpl; 

i,  ir, nc, xpts, inNum; 
mapnum, nmap, ncoef; 
arr  indx; 

*arraym, *xm, *ym, incre, offsetV; 
tmO , tml , hexval ; 
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char  hexstr[80],  *token,  *stpstr,  strDelimit [ ] ="  \n"; 
/ /  Starting  NFG 

printf  ( " \n*  * *  START  UP  THE  NFG  ***\n"); 
mapnum  =  0 ; 
nmap  =  1 ; 


//  User  interface 

printf  (" - \n"  )  ; 

printf ( "Function  1.  2Ax  :  l\n"); 

printf ( "Function  2.  1/x  :  2\n"); 

printf ( "Function  3.  sqrt (x)  :  3\n"); 

printf ( "Function  4.  l/sqrt(x)  :  4\n"); 

printf ( "Function  5.  log2 (x)  :  5\n"); 

printf ( "Function  6.  In (x)  :  6\n"); 

printf ( "Function  7.  sin(pi*x)  :  7\n"); 

printf ( "Function  8.  cos (pi*x)  :  8\n"); 

printf ( "Function  9.  tan(pi*x)  :  9\n"); 

printf ( "Function  10.  sqrt (-In (x) )  :  10\n"); 

printf ( "Function  11.  tan(pi*x)A2  +  1  :  ll\n"); 

printf ( "Function  12.  - (x*log2 (x)  +  ( 1-x) *log2 ( 1-x) ) :  12\n"); 

printf ( "Function  13.  l/(l+eA(-x))  :  13\n"); 

printf ( "Function  14.  (1/sqrt (2*pi) ) *exp (-xA2/2)  :  14\n"); 

printf ( "Function  15.  sin(exp(x))  :  15\n"); 

printf  (" - \n"  )  ; 


//inNum  =1;  //  dummy  default  value 

printf (" \nSelect  which  function  to  implement:  "); 
scanf("%i",  SinNum) ; 

printf ("What  value  did  I  enter:  %i  \n  ", inNum); 


//  Open  the  Hex  data  file  to  read 

switch  (inNum) 

{ 

case  1:  fpl  =  f open ( "Data/memHl . mem" , "r" ) ; 
break; 

case  2:  fpl  =  fopen ( "Data/memH2 .mem" , "r" ) ; 
break; 

case  3:  fpl  =  fopen ( "Data/memH3 .mem" , "r" ) ; 
break; 

case  4:  fpl  =  fopen ( "Data/memH4 .mem" , "r" ) ; 
break; 

case  5:  fpl  =  fopen ("Data/memH5 .mem", "r") ; 
break; 

case  6:  fpl  =  fopen ( "Data/memH6 . mem" , "r" ) ; 
break; 

case  7:  fpl  =  fopen ( "Data/memH7 .mem" , "r" ) ; 
break; 

case  8:  fpl  =  fopen ( "Data/memH8 .mem" , "r" ) ; 
break; 

case  9:  fpl  =  f open ( "Data/memH9 . mem" ,"r"); 
break; 
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case  10:  fpl 
break; 

case  1 1 :  fpl 
break; 

case  12 :  fpl 
break; 

case  13:  fpl 
break; 

case  14 :  fpl 
break; 

default:  fpl 
break; 

} 

printf  ("fpl 


=  fopen ( "Data/memHIO .mem" , "r" ) ; 
=  fopen ("Data/memHll .mem",  "r")  ; 
=  fopen ( "Data/memH12 . mem" , "r" ) ; 
=  fopen ( "Data/memH13 . mem" , "r" ) ; 
=  fopen ( "Data/memH14 . mem" , "r" ) ; 
=  fopen ( "Data/memH15 .mem" , "r" ) ; 

%i\n" , fpl) ; 


//  !  allocate  map  to  this  problem 
map  allocate  (nmap) ; 

//  Read  in  the  number  of  segments  (decimal  #) 
fscanf  (fpl,  "%i",  Sncoef ) ; 
fscanf  (fpl,  "%llx",  Sincre) ; 
fscanf  (fpl,  "%llx",  SoffsetV) ; 

printf  ("ncoef:  %3i  incre:  %811x\n" , ncoef , incre); 


//  Accommodate  lots  of  resutls 
nc  =  30000; 


//  array  is  enough  room  to  hold  4  64  bit  data  pieces 
//  Perform  cache  allignment 

arraym  =  (int64  t  *)Cache  Aligned  Allocate  (4*ncoef*8); 

xm  =  (int64  t  *)Cache  Aligned  Allocate  (nc*8  ); 

ym  =  (int64  t  *)Cache  Aligned  Allocate  (nc*8  ); 

//  Get  rid  of  first  npc 

fgets  (hexstr,  sizeof  hexstr,  fpl); 

//  Read  all  endpoints  and  coefficients  into  OBM  banks 
for  (i=0; i<ncoef ; i++)  { 

fgets  (hexstr,  sizeof  hexstr,  fpl); 

token  =  strtok (hexstr, strDelimit) ; 

sscanf  (token,  "%llx",  Shexval) ; 
arraym[i*4]  =  hexval; 

token  =  strtok (NULL, strDelimit) ; 
sscanf  (token,  "%llx",  Shexval); 
arraym [i*4+l]  =  hexval; 

token  =  strtok (NULL, strDelimit) ; 
sscanf  (token,  "%llx",  Shexval) ; 
arraym[i*4+2]  =  hexval; 

token  =  strtok (NULL, strDelimit) ; 
sscanf  (token,  "%llx",  Shexval)  ; 
arraym[i*4+3]  =  hexval; 
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//  close  the  file 
f close  ( fpl ) ; 

//  create  some  values  to  test  with 
xpts  =  100; 

for  (ir=0; ir<xpts; ir++)  { 

arr  indx  =  ir  %  ncoef; 

xm[ir]  =  arraym[arr  indx*4];  //  Optional  -0x2061d; 

printf  ("arr  indx  =  %3i  xm[%2i]=  %1011x\n", 

arr  indx,  ir,  xra[ir] ) ; 


printf  ("Right  Before  MAP  ***  \nmain  ncoef  %i  xpts  %i\n", 

ncoef,  xpts) ; 

subr  map  (arraym, ncoef , incre, offsetV, xm, ym, xpts, &tmO,mapnum) 

printf  ("\n************Back  from  the  MAP !!!********** \n" ) ; 
printf  ("\n**************  SHIFT8  ***************\n") ; 
printf  ("%lld  clocks\n",  tmO); 
for (i=0; i<xpts; i++) { 

printf  ("i:  %3i  x:  %811x  fx:  %1011x\n" , i , xm [ i ] , ym [ i ]  ); 

} 

printf  ("%lld  clocks\n",  tml); 

//  !  release  the  map  resources 

map  free  (nmap) ; 


b.  subr. me 

#include  <libmap.h> 


void  subr_map  (int64_t  ac [ ] , 
int  ncoef, 

int64  t  incre, 
int64_t  offsetV, 
int64_t  xc [ ] , 
int64  t  yc [ ] , 


int 

int64  t 
int 


xpts , 
*time0 , 
mapno)  { 


/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkki 

*  Declarations 

•k  -k  'k'k'k'k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k-k'k-k'k-k'k-k'k'k'k-k'k-k'k'k'k-k'k-k'k'k'k-k'k'k'k-k'k'k'k-k'k-k'k'k'ki 

OBM_BANK_A  (segend,  int64_t,  MAX_OBM_SIZE) 

OBM_BANK_B  (a,  int64_t,  MAX_OBM_SIZE) 

OBM_BANK_C  (b,  int64_t,  MAX_OBM_SIZE) 

OBM_BANK_D  (c,  int64_t,  MAX_OBM_SIZE) 

OBM_BANK_E  (x,  int64_t,  MAX_OBM_SIZE) 

OBM_BANK_F  (y,  int64_t,  MAX_OBM_SIZE) 

int  i , j ,  nbytes ; 


int64  t  tmO , tml , varx, varsq, vara, varb, varc, ax2 , bxl , fx; 
int64  t  varxtmp, indx; 

/********************************************* 

*  Read  into  OBM.  Cooeff  &  segment  endpoints  * 
*********************************************/ 

//  4  data  values  (seg,a,b,c),  64bit  Hex  values 
nbytes  =  4*ncoef  *  8; 

DMA_CPU (CM20BM, segend, MAP_OBM_stripe (1, "A, B,C, D") , ac, 1, nbytes,  0) ; 
wait  DMA  ( 0 ) ; 

/ /  Read  in  the  Number  of  points 
nbytes  =  xpts  *  8; 

DMA_CPU  (CM20BM,  x,  MAP_OBM_stripe ( 1 , "E" ) ,  xc,  1,  nbytes,  0); 
wait  DMA  ( 0 ) ; 

//  DEBUG:  determine  when  in  Map 

printf  ("\n\n************  NOW  IN  MAP  **********\n'')  ; 
printf  ("MAP  subr  ncoef  %i  xpts  %i  \n" , ncoef , xpts ) ; 

/•k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k 

*  Read  timer  and  use  selector  to  determine  the  segment  * 

■k-k'k-k-k-k-k-k'k-k-k-k'k-k-k-k-k-k-k-k'k-k-k-k'k-k-k-k'k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k'k-k-k-k-k-k-k-k'k-k/ 

read_timer  (&tm0); 

incre  >>=  16;  //  asr  to  open  integer  bits 

offsetV  >>=  16;  //  asr  to  match  in  subtraction 

for  (i=0; i<xpts; i++) 

{ 


varx 

=  x  [i]  ; 

// 

Take  from  OBM  put  in  BRAM 

indx 

=  varx  * 

incre; 

// 

Segment  index  Number  *  x  input 

indx 

»=  32; 

// 

Return  to  16  fraction  points 

indx 

=  indx  - 

offsetV; 

// 

Adjust  index  to  interval  start 

indx 

»  16; 

// 

remove  fracion 

vara 

=  a [ (int) 

indx] ; 

// 

Move  from  OBM  into  BRAM 

varb 

=  b [ (int) 

indx] ; 

varc 

=  c [ (int) 

indx] ; 

//  --- 

-  Square  X  and  shift  - 

-// 

varx 

»=  8; 

// 

Remove  lower  8  bits. 

40.24 

varsq 

=  varx*varx; 

// 

Now  we  have  80.48  -> 

16.48 

varsq 

»=  24; 

// 

SRL  eliminate  40.24 

if  (varx  <  0x8000000000000000) 

// 

if  varx  is  positive 

varsq  =  varsq  &  OxOOOOOOFFFFFFFFFF;  //  bitwise  AND; 

24bits 

//  --- 

■  XA2  *  first  Coefficient 

-// 

vara 

»=  8; 

// 

remove  lower  8  bits. 

40.24 

ax2 

=  varsq*vara; 

// 

a [ indx] ; 

ax2 

»=  16; 

// 

Want  32.32,  so  srl  16 

if  (vara  <  0x8000000000000000) 

// 

if  both  +ve 

ax2  =  ax2  &  OxOOOOFFFFFFFFFFFF;  //  bitwise  AND;  16bits 
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//  -  X  *  second  Coefficient  --// 

varb  >>=  8;  //  Remove  lower  8  bits,  40.24 

bxl  =  varx*varb;  //  both  are  already  shifted 

bxl  >>=  16;  //  Return  to  32.32  (int.fract) 

if  (varb  <  0x8000000000000000)  //  if  both  +ve 

bxl  =  bxl  &  OxOOOOFFFFFFFFFFFF;  //  bitwise  AND;  16bits 

//  --  3  input  add  to  complete  --// 

y[i]  =  ax2+bxl+varc;  //  no  need  to  shift  varc 

//  DEBUG 

//  printf ( "indx :  %411x  ->  %41i  varx:  %611x  incre:  %611x\n", 

//  indx,  (int) indx, varx,  incre); 

} 

/ /  Time  it  took  to  compute 
read_timer  (&tml); 

*time0  =  tml-tmO; 


*  Send  back  the  results 
nbytes  =  xpts  *  8; 

DMA^CPU  (OBM2CM,  y,  MAP_OBM_stripe ( 1 , "F" ) ,  yc,  1,  nbytes,  0); 
wait  DMA  ( 0 ) ; 


C.2  NON-UNIFORM  SEGMENTATION 


1.  Floating  Point 
a.  Main.c 

#include<stdio . h> 

#include<stdlib . h> 

#include<strings . h> 

#include<libmap . h> 

//  Subroutine  initialization  in  Main 
void  subr_map (  double  acoef [] , 
int  ncoef, 

double  x  [  ] , 
double  y [ ] , 
double  ys [ ] , 
int  npts, 

int64  t  *timeO, 
int64  t  *timel, 
int  mapnum) ; 

//  MAIN 
main  ()  { 

//  Initialize  Variables 
FILE  *fpl ; 

double  *array,  *x,  *y,  *ys; 
double  val; 

int  i, ir, nc, npts,  mapnum, nmap,  ncoef, arr  indx; 

int64  t  tmO,  tml; 

printf  ( " \n*  *  *  START  UP  THE  NFG  ***\n"); 

//  select  map  number 
mapnum  =  0 ; 
nmap  =  1 ; 

//  !  allocate  map  to  this  problem 

map  allocate  (nmap) ; 

//  Depending  on  number  segments 

//nc  =  50;  //  For  16  bit  accuracy 

nc  =  200;  //  For  23  bits 

//nc  =  1500;  //  For  32  bits 

//nc  =  5000;  //  For  42  bits 

array  =  (double* ) Cache_Aligned_Allocate  (4*nc*8) ; 

x  =  (double* ) Cache_Aligned_Allocate  (nc*8  ); 

y  =  (double* ) Cache_Aligned_Allocate  (nc*8  ); 

ys  =  (double* ) Cache_Aligned_Allocate  (nc*8  ); 


fpl  =  fopen  ( "Data/memDEC . mem" , "r" ) ; 
fscanf  (fpl,  "%i",  Sncoef )  ; 

//  check  if  the  right  thing  was  read 
printf  ("  ncoef  %i\n" , ncoef); 
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//  read_file 

for  (i=0; i<ncoef ; i++)  { 

fscanf  (fpl,  "%lf",  &val); 
array [i*4]  =  val; 

fscanf  (fpl,  "%lf",  &val) ; 
array[i*4+l]  =  val; 

fscanf  (fpl,  "%lf",  &val) ; 
array[i*4+2]  =  val; 

fscanf  (fpl,  "%lf",  &val) ; 
array[i*4+3]  =  val; 

}  //  end  read_file 

/*  //  print_array 

for  (i=0; i<ncoef ; i++)  { 

printf  ("  endpt  %10.6f  a  %10.6f  b  %10.6f  c  %10.6f\n", 

array [4*i+0] , 

array [ 4*i+l ] , 

array [ 4  *i+2 ] , 

array [4*i+3] ) ; 

}  / /  end  print  array 


npts  =  100; 

//  create_samples 
for  (ir=0; ir<npts; ir++)  { 

arr  indx  =  ir  %  ncoef; 
x[ir]  =  array [arr  indx*4]; 

printf  ("ir  %3i  x_values  are:  %lf \n" , ir , x [ ir ] ) ; 

}  //end  create_samples 

printf  ("main  ncoef  %i  npts  %i\n", ncoef , npts) ; 

subr_map  (array,  ncoef,  x,  y,  ys,  npts,  &tm0,  &tml,  mapnum) ; 

printf  ("\n************  BACK  FROM  MAP  **********\n'')  ; 
printf  ("%lld  clocks\n",  tmO); 
printf  ("%lld  clocks\n",  tml); 

for  (i=0; i<npts; i++)  { 

printf  ("x:  %5.181f  ysubr:  %5.181f  SRCMacro2Ax:  %5.18f\n 
x  [  i ] ,  y [ i ] ,  ys  [  i ] )  ; 

//  printf  ("x:  %5.18f  ysubr:  %5 . 18f \n" , x [i] , y [i] ) ; 


/ /  !  release  the  map  resources 

map  free  (nmap) ; 


b. 


#include  <libmap.h> 

void  subr_map  (  double  ac [ ]  , 
int  ncoef, 

double  xc [ ] , 
double  yc [ ] , 
double  ys [ ] , 
int  npts, 

int64  t  *timeO, 
int64  t  *timel, 
int  mapno) 

{ 

/  -k  ~k kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk 

*  Declarations 

•k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k  J 

OBM_BANK_A  (ysmap,  double,  MAX_OBM_SIZE) 

OBM_BANK_B  (a,  double,  MAX_OBM_SIZE) 

OBM_BANK_C  (b,  double,  MAX__OBM__SIZE) 

OBM_BANK_D  (c,  double,  MAX_OBM__SIZE) 

OBM_BANK_E  (x,  double,  MAX__OBM_SIZE) 

OBM_BANK_F  (y,  double,  MAX_OBM_SIZE) 

int  i,j,  nbytes,  indx,  sel; 

int64  t  tmO,tml; 

double  varx; 

l-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k 

*  Read  in  the  cooeff  and  segment  endpoints 

kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 

nbytes  =  4*ncoef  *  8;  /*  4  data  values  (seg,a,b,c),  64bits  each  */ 

DMA^CPU  (CM20BM,  ysmap,  MAP_OBM_stripe ( 1 , "A, B, C, D" ) ,  ac,  1,  nbytes, 

0)  ; 

wait  DMA  ( 0 ) ; 

/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk 

*  Read  in  the  Number  of  points 

kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk/ 

nbytes  =  npts  *  8; 

DMA^CPU  (CM20BM,  x,  MAP_OBM_stripe ( 1 , "E" ) ,  xc,  1,  nbytes,  0); 
wait  DMA  ( 0 ) ; 

/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk 

*  Useful  in  Debug  Mode  to  determine  when  in  Map 

kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 

printf  ("\n\n************  NOW  IN  MAP  **********\n'')  ; 
printf  ("MAP  subr  ncoef  %i  npts  %i\n" , ncoef , npts) ; 

/kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk 

*  Read  timer  and  use  a  constant  for  UNIFORM  Segmentation 

kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkj 

read  timer  (&tm0); 
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for 

{ 

(i=0; i<npts; i++) 

varx  =  x [ i ] ; 

if  (  varx  <=  1 . 

010456600772177400) 

sel  = 

1; 

else  if  (  varx 

<=  1.254138569173091300) 

sel  = 

2; 

else  if  (  varx 

<=  1.393018722518969900) 

sel  = 

3; 

else  if  (  varx 

<=  1.414213562373095100) 

sel  = 

switch  (sel) 

{ 

case  1 : 

4; 

select  pri 

64bit  32val (  varx 

<= 

0.065896761049097793, 

0, 

varx 

<= 

0.113411555832503830, 

1, 

varx 

<= 

0.155068672182882060, 

2, 

varx 

<= 

0.193392483833442240, 

3, 

varx 

<= 

0.229466279456250750, 

4, 

varx 

<= 

0.263888271986404410, 

5, 

varx 

<= 

0.297033228392699020, 

6, 

varx 

<= 

0.329159950015850300, 

7, 

varx 

<= 

0.360453699017791120, 

8, 

varx 

<= 

0.391055896896180420, 

9, 

varx 

<= 

0.421076852419192020, 

o 
\ — 1 

varx 

<= 

0.450608489560304140, 

\ — 1 
\ — 1 

varx 

<= 

0.479725761713275970, 

CM 
\ — 1 

varx 

<= 

0.508490894337077280, 

13, 

varx 

<= 

0.536959041815795230, 

■vT 
\ — 1 

varx 

<= 

0.565178287458633520, 

LO 
\ — 1 

varx 

<= 

0.593191057714890110, 

16, 

varx 

<= 

0.621035536388932610, 

o 

1 — 1 

varx 

<= 

0.648747078855175910, 

CO 

\ — 1 

varx 

<= 

0.676359626273058010, 

Oh 
\ — 1 

varx 

<= 

0.703904291372063890, 

20, 

varx 

<= 

0.731409358451725390, 

\ — 1 
CM 

varx 

<= 

0.758903111811573990, 

22, 

varx 

<= 

0.786413835751141770, 

23, 

varx 

<= 

0.813969814569960310, 

24, 

varx 

<= 

0.841596504137608230, 

25, 

varx 

<= 

0.869322188753617440, 

26, 

varx 

<= 

0.897175152717519460, 

27, 

varx 

<= 

0.925183680328846240, 

CO 

CM 

varx 

<= 

0.953378884317082620, 

29, 

varx 

<= 

0.981791877411713590, 

30, 

31 ,  &indx) ; 

break; 
case  2 : 

select  pri 

64bit  8val (  varx 

<= 

1.039409823987864900, 

32, 

varx 

<= 

1.068692559293097600, 

33, 
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varx 

<= 

1.098347233137172600, 

34, 

varx 

<= 

1 . 128421928829294700, 

35, 

varx 

<= 

1 . 158970386538573600, 

36, 

varx 

<= 

1 .190056245938955900, 

37, 

varx 

<= 

1.221750217779271400, 

38, 

39,  &indx) ; 

break; 
case  3 : 

select  pri  64bit  4val  (  varx 

<= 

1.287320295168777200, 

40, 

varx 

<= 

1.321418432469292400, 

41, 

varx 

<= 

1.356585716292107800, 

42, 

43,  &indx) ; 

break; 
case  4 ; 

select  pri  64bit  4val (  varx 

<= 

1.414213562373095100, 

44, 

varx 

<= 

1.414213562373095100, 

44, 

varx 

<= 

1.414213562373095100, 

44, 

44 ,  &indx) ; 

} 

break; 

y  [ 

i]  =  a [ indx] *varx*varx  +  varx*b 

[  indx]  +  c [ indx] ; 

// 

printf  ("i  %3i  a  %f  b  %f  c 

S-  -F 
o  L 

x  %20 . 18f  y  %20.18f\n". 

} 

// 

indx, a [ indx] , b [ indx] , c [ 

indx] , varx, y [ i ] ) ; 

read  timer  (&tml); 

* timed 

=  tml-tmO; 

read  timer  (&tmO); 

/ /  Function  1 

for 

(i 

=0;  i<npts;  i++) 

ysmap [ i ]  =  (1/sqrtf (2*3 . 14159265258979) ) *powf (2 . 71828182845905, - 

0 . 5*powf (x 

[i] , 2) ) ;  //  func  14 

/ / ysmap | 

i] 

=  powf (2, x [i] ) ; 

/ /  func  1 

/ /ysmap | 

i] 

=  1  /  x  [  i  ]  ; 

/ /  func  2 

/ / ysmap | 

i] 

=  sqrtf (x  [i] )  ; 

/ /  func  3 

/ / ysmap | 

i] 

=  1/ sqrtf (x  [i] )  ; 

/ /  func  4 

/ / ysmap | 

i] 

=  logf(x[i] ) /0. 693147180559945 

r 

/ /  func  5 

/ / ysmap | 

i] 

=  logf (x [i] ) ; 

/ /  func  6 

/ / ysmap | 

i] 

=  sinf(x[i]*3. 14159265258979)  ; 

/ /  func  7 

/ / ysmap | 

i] 

=  cosf(x[i]*3. 14159265258979) ; 

/ /  func  8 

/ / ysmap | 

i] 

=  tanf(x[i]*3. 14159265258979)  ; 

/ /  func  9 

/ / ysmap | 

i] 

=  sqrt (-l*logf (x [i]  )  )  ; 

//  func  10 

/ / ysmap | 

i] 

=  powf (tanf (x[i] *3. 14159265258979 

) , 2) ;  //  func  11 

/ / ysmap | 

i] 

- (x[i] *logf (x[i] ) /0. 69314718055994  +  (1-x [i] ) *logf (1- 

x [i] ) /0. 

69314718055994  );//  func  12 

/ / ysmap | 

i] 

=  1/ (1+powf (0.693147180559945, 

(-1 

*x[i] ) ) ) ;  //  func  13 

/ / ysmap | 

i] 

(1/sqrtf (2*3.14159265258979)) *powf (2.71828182845905,- 

0 . 5*powf  (x 

[i] , 2) ) ;  //  func  14 
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//ysmap[i]  =  sinf (powf (2 . 71828182845905,  x [i] )) ;  //  func  15 

read_timer  (&tml); 

*timel  =  tml  -  tmO; 

/****************************************************************** 
*  Send  back  the  results 

******************************************************************/ 
nbytes  =  npts  *  8; 

DMA_CPU  (OBM2CM,  y,  MAP_OBM_stripe (1 ,  "F" )  ,  yc,  1,  nbytes,  0); 
wait  DMA  ( 0 ) ; 

nbytes  =  npts  *  8; 

DMA_CPU  (OBM2CM,  ysmap,  MAP_OBM_stripe (1, "A") ,  ys,  1,  nbytes,  0); 
wait  DMA  ( 0 ) ; 


2.  Fixed  Point 


a.  Main.c 


#include<stdio . h> 
#include<stdlib . h> 
#include<st rings . h> 
#include<libmap . h> 
#include<math . h> 


//  Subroutine  initialization  in  Main 
void  subr^map  (int64_t  acoef [] , 
int  ncoef. 


int64_t  x [ ] , 
int64_t  y [ ]  , 
int  xpts, 

int64  t  *time0, 
int  mapnum)  ; 


//  MAIN 
main  ()  { 

//  Initialize  Variables 

FILE  *fpl; 

int  i, ir, nc, xpts; 

int  mapnum, nmap, ncoef; 

int  arr  indx; 

int64  t  *arraym, *xm, *ym; 

int64  t  tmO , tml , hexval ; 

char  hexstr[80],  *token,  *stpstr,  strDelimit [ ] ="  \n"; 
//  Starting  NFG 

printf  ( " \n*  *  *  START  UP  THE  NFG  ***\n"); 
mapnum  =  0 ; 
nmap  =  1 ; 
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//  !  allocate  map  to  this  problem 
map  allocate  (nmap) ; 
nc  =  300; 

//  array  is  enough  room  to  hold  4  64  bit  data  pieces 
//  Perform  cache  allignment 

arraym  =  (int64  t  *)Cache  Aligned  Allocate  (4*nc*8) ; 

xm  =  (int64  t  *)Cache  Aligned  Allocate  (nc*8  ); 

ym  =  (int64  t  *)Cache  Aligned  Allocate  (nc*8  ); 

//  Open  the  Hex  data  file  to  read 

fpl  =  fopen  ( "Data/memHEXOx .mem" , "r" ) ; 

printf  ("fpl  %i\n" , fpl ) ; 

//  Read  in  the  number  of  segments  (decimal  #) 
fscanf  (fpl,  "%i",  Sncoef ) ; 
printf  ("  ncoef  %i\n" , ncoef ) ; 

//  Get  rid  of  first  npc 

fgets  (hexstr,  sizeof  hexstr,  fpl); 

//  Read  all  endpoints  and  coefficients  into  OBM  banks 
for  (i=0; i<ncoef ; i++)  { 

fgets  (hexstr,  sizeof  hexstr,  fpl); 

token  =  strtok (hexstr, strDelimit)  ; 

sscanf  (token,  "%llx",  Shexval) ; 
arraym [i* 4]  =  hexval; 

token  =  strtok (NULL, strDelimit)  ; 
sscanf  (token,  "%llx",  Shexval) ; 
arraym[i*4+l]  =  hexval; 

token  =  strtok (NULL, strDelimit)  ; 
sscanf  (token,  "%llx",  Shexval) ; 
arraym[i*4+2]  =  hexval; 

token  =  strtok (NULL, strDelimit) ; 
sscanf  (token,  "%llx",  Shexval) ; 
arraym[i*4+3]  =  hexval; 

} 

f close ( fpl ) ; 

//  print  out  the  contents  of  the  array  first  30  elements  only 
for  (i=0; i<30; i++)  { 

printf  ("endpoint:  %llx  a:  %llx  b:  %llx  c:  %llx  \n", 
arraym[i*4] , arraym[i*4+l] , arraym[i*4+2] , arraym[i*4+3] ) ; 


//  create  some  values  to  test  with 
xpts  =  30; 

for  (ir=0; ir<xpts; ir++)  { 

//arr  indx  =  (int)  fabs (remainder (ir, 20) ) ; 
arr  indx  =  ir  %  ncoef; 

xm[ir]  =  arraym[arr  indx*4 ] ; //+0xa0000000 ; 


b.  subr.mc 


#include  <libmap.h> 

void  subr  map  (int64  t 

ac  [  ]  , 

int 

ncoef , 

int64  t 

xc  [  ]  , 

int64  t 

yc [] , 

int 

xpts , 

int64  t 

*time0 , 

int 

mapno)  { 

/**************************************************************** 

*  Declarations 

****************************************************************/ 

OBM  BANK  A  (segend. 

int64  t,  MAX  OBM  SIZE) 

OBM  BANK  B  (a. 

int64  t,  MAX  OBM  SIZE) 

OBM  BANK  C  (b. 

int64  t,  MAX  OBM  SIZE) 

OBM  BANK  D  (c. 

int64  t,  MAX  OBM  SIZE) 

OBM  BANK  E  (x. 

int64  t,  MAX  OBM  SIZE) 

OBM  BANK  F  (y. 

int64  t,  MAX  OBM  SIZE) 

int  i,j,  nbytes,  sel; 

int64  t  tmO,tml, 

indx, varx, varsq, vara, varb, varc, ax2 , bxl , fx; 

*  Read  into  OBM.  Cooeff  &  segment  endpoints  * 

//  4  data  values  (seg,a,b,c),  64bit  Hex  values 

nbytes  =  4*ncoef  * 

8; 

DMA  CPU  (CM2 OBM, 

segend,  MAP  OBM  stripe ( 1 , "A, B, C, D" ) ,  ac,  1, 

nbytes,  0) ; 

wait  DMA  ( 0 ) ; 

/ /  Read  in  the  Number  of  points 

nbytes  =  xpts  *  8; 
DMA  CPU  (CM20BM,  x. 

MAP  OBM  stripe  (1, "E") ,  xc,  1,  nbytes,  0); 
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wait  DMA  ( 0 ) ; 


//  DEBUG:  determine  when  in  Map 

printf  ("\n\n************  NOW  IN  MAP  **********\n'')  ; 
printf  ("MAP  subr  ncoef  %i  xpts  %i  \n" , ncoef , xpts) ; 

/•k-k-k-k-k-k'k-k-k-k-k-k-k'k-k-k-k-k-k-k-k'k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k'k'k-k-k-k 

*  Read  timer  and  use  selector  to  determine  the  segment  * 

•k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k/ 

read_timer  (&tmO); 
for  (i=0; i<xpts; i++) 

{ 

varx  =  x  [  i ] ; 

if  (  varx  <=  0x000000001816a7a6) 
sel  =  1; 

else  if  (  varx  <=  0x000000003b3b34a8 ) 
sel  =  2; 

else  if  (  varx  <=  0x0000000040000000) 
sel  =  3; 

switch  (sel) 

{ 

case  1 : 


select  pri  64bit  128val (  varx 

<= 

0x0000000000 8 4 lcdf. 

0, 

varx 

<= 

0x000000000088 5b08, 

1, 

varx 

<= 

0x0000000000 8cbea6, 

2, 

varx 

<= 

0x000000000091438e, 

3, 

varx 

<= 

0x0000000000 95edeb, 

4, 

varx 

<= 

0x0000000000 9abdbc, 

5, 

varx 

<= 

0x00000000009 fb301. 

6, 

varx 

<= 

0x0000000000a4dle3, 

7, 

varx 

<= 

0x0000000000 aala64. 

8, 

varx 

<= 

0x0000000000af8c81, 

9, 

varx 

<= 

0x000000000 0b52 83d, 

o 
\ — 1 

varx 

<= 

OxOOOOOOOOOObaf lbf , 

11, 

varx 

<= 

0x0000000000c0e908, 

CM 
\ — 1 

varx 

<= 

0x0000000000c71241, 

13, 

varx 

<= 

0x0000000000 cd6d6a. 

■vT 
\ — 1 

varx 

<= 

0x0000000000d3fa84, 

LO 
\ — 1 

varx 

<= 

0x0000000000dabdb8 , 

16, 

varx 

<= 

0x0000000000elb705, 

l> 

\ — 1 

varx 

<= 

0x0000000000e8e66b, 

CO 
\ — 1 

varx 

<= 

0x0000000000f05015, 

19, 

varx 

<= 

0x0000000000f7f401, 

20, 

varx 

<= 

0x0000000000ffd65a, 

\ — 1 
CM 

varx 

<= 

0x000000000107f71f, 

22, 

varx 

<= 

0x0000000001105651, 

23, 

varx 

<= 

0x000000000118f818, 

24, 

varx 

<= 

0x000000000121e09e, 

25, 

varx 

<= 

0x000000000 12b0fe3, 

26, 

varx 

<= 

0x000000000134 85e7, 

27, 

varx 

<= 

0x000000000 13e 4 6d4 , 

CO 

CM 

varx 

<= 

0x00000000014 856d2 , 

29, 
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varx 

<= 

0x000000000152b5e2, 

30, 

varx 

<= 

0x00000000015d6404, 

31, 

varx 

<= 

0x000000000168698a, 

32, 

varx 

<= 

0x000000000173c675, 

33, 

varx 

<= 

0x00000000017 f7ac 4, 

34, 

varx 

<= 

0x000000000 18b8aal, 

35, 

varx 

<= 

0x0000000001 97 fa35. 

36, 

varx 

<= 

0x000000000 la4cda 9, 

37, 

varx 

<= 

0x0000000001b204fe, 

38, 

varx 

<= 

0x0000000001bfa45d, 

39, 

varx 

<= 

0x000000000 lcdabc 6, 

40, 

varx 

<= 

0x0000000001 dc238b. 

41, 

varx 

<= 

0x000000000 lebObad, 

42, 

varx 

<= 

0x0000000001fa6855, 

43, 

varx 

<= 

0x0000000002 0a3dac, 

44, 

varx 

<= 

0x0000000002 la8fdc. 

45, 

varx 

<= 

0x00000000022b5ee4, 

46, 

varx 

<= 

0x00000000023 cb31 8, 

47, 

varx 

<= 

0x00000000024e8c77, 

48, 

varx 

<= 

0x0000000002 60ef2a, 

49, 

varx 

<= 

0x000000000273e386, 

50, 

varx 

<= 

0x0000000002876988, 

51, 

varx 

<= 

0x0000000002 9b 8 985, 

52, 

varx 

<= 

0x0000000002b0437b, 

53, 

varx 

<= 

0x0000000002c59fbf, 

54, 

varx 

<= 

0x0000000002 db9e4f. 

55, 

varx 

<= 

0x0000000002f2477f, 

56, 

varx 

<= 

0x00000000030 99f7 8, 

57, 

varx 

<= 

0x00000000032 lae 8c, 

58, 

varx 

<= 

0x00000000033a74bc, 

59, 

varx 

<= 

0x000000000353fa5a, 

60, 

varx 

<= 

0x0000000003 6e 4390, 

61, 

varx 

<= 

0x0000000003895 8b0, 

62, 

varx 

<= 

0x0000000003a539bb, 

63, 

varx 

<= 

0x0000000003clf32c, 

64, 

varx 

<= 

0x0000000003df892d, 

65, 

varx 

<= 

0x0000000003fdffe7, 

66, 

varx 

<= 

0x0000000004 ld5fac. 

67, 

varx 

<= 

0x00000000043db0dl, 

68, 

varx 

<= 

0x0000000004 5efba 6, 

69, 

varx 

<= 

0x0000000004814457, 

70, 

varx 

<= 

0x0000000004a48f0b, 

71, 

varx 

<= 

0x0000000004c8e83f, 

72, 

varx 

<= 

0x0000000004 ee 541c, 

73, 

varx 

<= 

0x000000000514 df If, 

74, 

varx 

<= 

0x00000000053c8d71, 

75, 

varx 

<= 

0x0000000005656765, 

76, 

varx 

<= 

0x00000000058f7975, 

77, 

varx 

<= 

0x0000000005bac7cd, 

78, 

varx 

<= 

0x0000000005e75ee8, 

79, 

varx 

<= 

0x0000000006154718, 

80, 

varx 

<= 

0x00000000064488bl, 

81, 

varx 

<= 

0x000000000675302f, 

82, 

varx 

<= 

0x0000000006a745e3, 

83, 

varx 

<= 

0x000000000 6dad222 , 

84, 
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varx 

<= 

0x00000000070fe58f, 

LT) 

CO 

varx 

<= 

0x0000000007468456, 

kD 

CO 

varx 

<= 

0x00000000077 ebfla. 

87, 

varx 

<= 

0x0000000007b89e30, 

CO 

CO 

varx 

<= 

0x0000000007 f42el2. 

O'! 

CO 

varx 

<= 

0x0000000008317b3d, 

90, 

varx 

<= 

0x000000000870922d, 

\ — 1 

O'! 

varx 

<= 

0x000000000 8b 17f5e, 

92, 

varx 

<= 

0x0000000008f45375, 

93, 

varx 

<= 

0x00000000093916c6, 

94, 

varx 

<= 

0x000000000 97 fd9f5. 

95, 

varx 

<= 

0x000000000 9c8a97f, 

96, 

varx 

<= 

0x000000000al39609, 

97, 

varx 

<= 

0x000000000a60b038, 

98, 

varx 

<= 

0x000000000 ab00489. 

99, 

varx 

<= 

0x00000000 Ob 01a3al, 

100, 

varx 

<= 

0x00000000 0b559e25, 

101, 

varx 

<= 

0x000000000bac04bb, 

102, 

varx 

<= 

0x000000000c04e808, 

103, 

varx 

<= 

0x000000000c605cdb, 

104, 

varx 

<= 

0x000000000cbe6fb0 , 

105, 

varx 

<= 

0x000000000dlf3980, 

106, 

varx 

<= 

0x000000000d82c6c5, 

107, 

varx 

<= 

0x000000000de93079, 

108, 

varx 

<= 

0x000000000e528741, 

109, 

varx 

<= 

OxOOOOOOOOOebedf eb. 

110, 

varx 

<= 

0x000000000f2e5370, 

111, 

varx 

<= 

0x000000000fa0f275, 

112, 

varx 

<= 

0x00000000101 6d5f 3, 

113, 

varx 

<= 

0x0000000010901 6e0, 

114, 

varx 

<= 

OxOOOOOOOOllOccaOd, 

115, 

varx 

<= 

0x00000000118 dO 871, 

116, 

varx 

<= 

0x00000000 12 10e6db, 

117, 

varx 

<= 

0x000000001298826c, 

118, 

varx 

<= 

0x000000001323f41d, 

119, 

varx 

<= 

0x000000001 3b3590f, 

120, 

varx 

<= 

0x000000001446c611, 

121, 

varx 

<= 

0x0000000014 de 5c 6e, 

122, 

varx 

<= 

0x00000000157a3947, 

123, 

varx 

<= 

0x00000000161a7594, 

124, 

varx 

<= 

0x00000000 16bf32a0, 

125, 

break; 
case  2 : 

varx 

<= 

0x0000000017 688d8d,  126, 

127 ,  &indx) ; 

select  pri  64bit  32val ( 

varx 

<= 

0x000000001 8c9a2 34, 

128, 

varx 

<= 

0x00000000 198 19a5b, 

129, 

varx 

<= 

0x000000001a3eb58d, 

130, 

varx 

<= 

0x000000001b01193f, 

131, 

varx 

<= 

0x00000000 Ibc8e2 94, 

132, 

varx 

<= 

0x00000000 Ic963b2 7 , 

133, 

varx 

<= 

0x000000001d694444, 

134, 

varx 

<= 

0x000000001e422789, 

135, 

varx 

<= 

0x000000001f210a6a, 

136, 
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varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 
varx  <= 


break; 
case  3 : 

select  pri  64bit  4val  (  varx  <= 

varx  <= 
varx  <= 

break; 


0x0000000020061683,  137, 
0x000000002 Ofl 7 57 4,  138, 
0x000000002 Ie350d9,  139, 
0x0000000022dbd679,  140, 
0x0000000023db2ff2,  141, 
0x000000002 4el8b0b,  142, 
0x0000000025efl58a,  143, 
0x000000002 7 03fd3 7,  144, 
0x0000000028207402,  145, 
0x000000002 94 4a7bl,  146, 
0x000000002a70ce5e,  147, 
0x000000002ba519fa,  148, 
0x000000002 celc09e,  149, 
0x000000002e26f43b,  150, 
0x000000002f74efl3,  151, 
0x0000000030cbe73f ,  152, 
0x00000000322cl2da,  153, 
0x0000000033 95ac2 7 ,  154, 
0x000000003508fl91,  155, 
0x0000000036861933,  156, 
0x00000000380d5d4f,  157, 
0x0000000039 9efc52,  158, 
159,  &indx) ; 


0x000000003ce244bd,  160, 
0x000000003e94  66d5 ,  161, 
0x0000000040000000,  162, 

1 62 ,  &indx) ; 


Shift  by  8  bits 


=  a [ indx] ; 
=  b [ indx] ; 


varx  >>= 
vara  >>= 
varb  >>= 


//  Shift  right 
/ /  Shift  right 
/ /  Shift  right 


8  for  mult  40.24 


// - Square  X  and  shift - // 

varsq  =  varx*varx;  //  Now  we  have  80.48  ->  16.48 

varsq  >>=  24;  //  SRL  eliminate  40.24 

varsq  =  varsq  &  OxOOOOOOFFFFFFFFFF;  //  bitwise  AND;  24bits 

//  --  XA2  *  first  Coefficient  --// 

ax2  =  varsq*vara;  //  a [indx]; 

ax2  >>=  16;  //  Want  32.32,  so  srl  16 

if  (vara  <  0x8000000000000000)  //  if  both  +ve 

ax2  =  ax2  &  OxOOOOFFFFFFFFFFFF;  //  bitwise  AND;  16bits 


//  -  X  *  second  Coefficient  --// 

bxl  =  varx*varb;  //  both  are  already  shifted 

bxl  >>=  16;  //  Return  to  32.32  (int.fract) 
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if  (varb  <  0x8000000000000000)  //  if  both  +ve 

bxl  =  bxl  &  OxOOOOFFFFFFFFFFFF;  //  bitwise  AND;  16bits 

//  --  3  input  add  to  complete  --// 

y[i]  =  ax2+bxl+c [ indx] ;  //  Add  all,  no  need  to  shift  varc 

//  DEBUG:  printf  for  debug  information  on  variable  status 
printf  ("indx:  %3i,  varx:  %811x  vasq:  %1011x  a:  %1011x  ax2 : 
%1611x  b:  %1611x  bxl:  %1611x  c:  %1011x  fx:  %1611x  \n", 

(int)indx,  varx,  varsq,  vara,  ax2, 

varb,  bxl,  c[indx],  y[i]); 

} 

//  Time  it  took  to  compute 
read_timer  (&tml); 

*time0  =  tml-tmO; 


*  Send  back  the  results 
nbytes  =  xpts  *  8; 

DMA_CPU  (OBM2CM,  y,  MAP_OBM_stripe ( 1 , "F" ) ,  yc,  1,  nbytes,  0); 
wait  DMA  ( 0 ) ; 

} 


3.  Fixed  Point  with  Macro 

This  implementation  did  not  produce  the  correct  values.  The  multiplier  macro 
used  in  this  case  was  the  VHDL  macro  shown  in  Appendix  B. 

The  user  can  add  macros  to  the  Makefile  that  are  coded  in  VHDL,  Verilog  or  in 
both  description  languages.  Here  we  show  two  VHDL  files  added  to  the  Makefile  and  the 
blk.v  and  info  files. 


a.  Makefile 

#  $Id:  Makefile, v  2. 0.0.1  2005/06/10  23:12:59  hammes  Exp  $ 

# 

#  Copyright  2003  SRC  Computers,  Inc.  All  Rights  Reserved. 

# 

#  Manufactured  in  the  United  States  of  America. 

# 

#  SRC  Computers,  Inc. 

#  4240  N  Nevada  Avenue 

#  Colorado  Springs,  CO  80907 

#  (v)  (719)  262-0213 

#  (f)  (719)  262-0223 
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# 

#  No  permission  has  been  granted  to  distribute  this  software 

#  without  the  express  permission  of  SRC  Computers,  Inc. 

# 

#  This  program  is  distributed  WITHOUT  ANY  WARRANTY  OF  ANY  KIND. 

# 

# 

# - 

#  User  defines  FILES,  MAPFILES,  and  BIN  here 

#  - 

FILES  =  main.c 

MAPFILES  =  subr.mc 

BIN  =  nfg 

# - 

#  Multi  chip  info  provided  here 

#  (Leave  commented  out  if  not  used) 

#  - 

#PRIMARY  =  <primary  file  1>  <primary  file  2> 

#SECONDARY  =  <secondary  file  1>  <secondary  file  2> 

#CHIP2  =  <file  to  compile  to  user  chip  2> 

# - 

#  User  defined  directory  of  code  routines 

#  that  are  to  be  inlined 

#  - 

#INLINEDIR  = 


# - 

#  User  defined  macros  info  supplied  here 

# 

#  (Leave  commented  out  if  not  used) 

#  - 


#MACROS 
#MY_BLKBOX 
#MYJNIGO_DIR 
#MY  INFO 


=  my  macrol/mult  vrlg  64. v 
=  my  macrol/blk.v 
=  my  macrol 
=  my  macrol/info 


MACROS 

MY^BLKBOX 
MYJ4G0_DIR 
MY  INFO 


=  my  macro/mult  32to32.vhd  \ 
my  macro/add  32.vhd 
=  my  macro/blk.v 
=  my  macro 
=  my  macro/info 


# - 

#  Floating  point  macros  selection 

#  - 
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#FPMODE  =  SRC_IEEE_V1  #  Default  SRC  version  IEEE 

#FPMODE  =  SRC_IEEE_V2  #  Size  reduced  SRC  IEEE  with 

#  special  rounding  mode 

# - 

#  User  supplied  MCC  and  MFTN  flags 

#  - 

MCCFLAGS  =  -log  -explain  dep  -g  -keep  -use  par 
MFTNFLAGS  =  -log  -v 

# - 

#  User  supplied  flags  for  C  &  Fortran  compilers 

#  - 

CC  =  icc  #  icc  for  Intel  cc  for  Gnu 

FC  =  ifort  #  ifort  for  Intel  f77  for  Gnu 

LD  =  icc  #  for  C  codes 

#LD  =  ifort  #  for  Fortran  or  C/Fortran  mixed 

CFLAGS  = 

FFLAGS  = 

LDFLAGS  =  #  Flags  to  include  libs  if  needed 

# - 

#  VCS  simulation  settings 

#  (Set  as  needed,  otherwise  just  leave  commented  out) 

#  - 

#USEVCS  =  yes  #  YES  or  yes  to  use  vcs  instead  of  vcsi 

#VCSDUMP  =  yes  #  YES  or  yes  to  generate  vcd+  trace  dump 

# - 

#  No  modifications  are  required  below 

# 

MAKIN  ?=  $ (MC  ROOT) /opt /srcci /comp/ lib/AppRules .make 
include  $ (MAKInT 


b.  subr.mc 


#include  <libmap.h> 

void  subr  map 

(int64  t 

ac  []  , 

int 

ncoef , 

int64  t 

xc  []  , 

int64  t 

yc[] , 

int 

xpts , 

int64  t 

*timeO , 

int 

mapno)  { 

/**************************************************************** 

*  Declarations 

****************************************************************/ 

OBM  BANK  A 

( segend. 

int64  t,  MAX  OBM  SIZE) 

OBM  BANK  B 

(a. 

int64  t,  MAX  OBM  SIZE) 

OBM  BANK  C 

(b. 

int64  t,  MAX  OBM  SIZE) 

OBM  BANK  D 

(c. 

int64  t,  MAX  OBM  SIZE) 

OBM  BANK  E 

(x. 

int64  t,  MAX  OBM  SIZE) 
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OBM_BANK_F  (y,  int64_t,  MAX_OBM_S I ZE ) 

int  i ,  j  ,  nbytes ; 

int64  t  tmO , tml , indx; 

int  varx, vara, varb, varc, prod3 ,  prod2  ,  prodl ,  fx; 

int  xg,ag,bg,cg; 

/•k-k-k-k-k'k-k'k-k'k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k'k-k-k-k'k-k'k-k-k-k-k-k-k-k'k-k'k-k 

*  Read  into  OBM.  Cooeff  &  segment  endpoints  * 
*********************************************/ 

//  4  data  values  (seg,a,b,c),  64bit  Hex  values 
nbytes  =  4*ncoef  *  8; 

DMA^CPU  (CM20BM,  segend,  MAP_OBM_stripe ( 1 , "A, B, C, D" ) 
nbytes,  0) ; 

wait  DMA  ( 0 ) ; 

//  Read  in  the  Number  of  points 
nbytes  =  xpts  *  8; 

DMA_CPU  (CM2 OBM,  x,  MAP_OBM_stripe ( 1 , "E" ) ,  xc,  1,  nbytes, 
wait  DMA  ( 0 ) ; 


//  DEBUG:  Tell  me  I'm  in  the  MAP 

printf  ("\n\n************  NOW  IN  MAP  **********\n'')  ; 
printf  ("MAP  subr  ncoef  %i  xpts  %i  \n" , ncoef , xpts) ; 

/•k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k: 

*  Read  timer  and  use  selector  to  determine  the  segment 

•k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'k'. 


read_timer  (&tm0); 
for  (i=0; i<xpts; i++) 


split  64to32 (x [ i ] , &xg, Svarx) ; 


//  SEGMENT  INDEX  ENCODER 

//  Based  on  x  input,  determine  which  index  to  select 
//  the  coefficients  for  approximation 


select  pri  32bit  16val(  varx<= 


varx<= 

0xl2de, 

0 

varx<= 

0x2087, 

1 

varx<= 

0x2c8c, 

2 

varx<= 

0x37a9, 

3 

varx<= 

0x422b, 

4 

varx<= 

0x4c45, 

5 

varx<= 

0x5613, 

6 

varx<= 

0x5f aa. 

7 

varx<= 

0x6916, 

8 

varx<= 

0x7268, 

9 

varx<= 

0x7bac, 

10 

varx<= 

0x7f f f , 

11 

varx<= 

0x7f f f , 

11 

varx<= 

0x7f f f , 

11 

varx<= 

0x7f f f , 

11 

11 

1 1 ,  &indx) ; 


indx  =  i%12; 

split_64to32 (a [indx] , &ag, Svara) ; 
split_64to32 (b [ indx] , &bg, &varb) ; 
split_64to32 (c [indx] , &cg, Svarc) ; 


// 

use  macro  multiplier 

my 

mult (varx, varx, Sprodl ) ; 

// 

prodl 

=  xA2 

term 

// 

Perform  together 

my 

mult (prodl, vara, &prod2) ; 

// 

prod2 

=  axA2 

term 

my 

mult (varx,  varb, &prod3 ) ; 

// 

prod3 

=  bx 

term 

//  Perforin  final  add  stage 

//my_add (prod2 , prod3 , varc, &fx) ;  //  3  input  macro  adder 

fx  =  prod2+prod3+varc; 

//  Perform  final  add  stage 

//  Put  result  in  OBM 

y [ i ]  =  fx  &  OxOOOOOOOOFFFFFFFF; 

//  DEBUG:  printf  for  debug  information  on  variable  status 
//printf  ("indx:  %3i  a[]  :  %llx  varb:  %x  c:  %x  x:  %x  fx:  %lx, 
y[] :  %llx\n" , 

/ /  indx, a [indx] , varb, varc, varx, fx, y [i] ) ; 

//  printf  ("indx:  %3i  a:  %x  b:  %x  c:  %x  x:  %x  fx:  %lx,  y[]  : 

%llx\n" , 

/ /  indx, vara, varb, varc, varx, fx, y [ i ] ) ; 

//  printf  ("prodl:  %x  prod2 :  %x  prod3 :  %x  \n", 

//  prodl,  prod2,  prod3); 

}  //  End  for (i=0; i<xpts; i++) 

read_timer  (&tml); 

*timeO  =  tml-tmO; 

/***-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k 

*  Send  back  the  results 

-k-k-k-k-k-k'k-k-k-k'k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k-k'k-k-k-k'k-k-k-k-k-k-k-k-k-k-k-k'k-k-k-k'k-k-k-k-k-k-k-k-k-k-k-k-k-kj 

nbytes  =  xpts  *  8; 

DMA_CPU  (OBM2CM,  y,  MAP_OBM_stripe ( 1 , "F" ) ,  yc,  1,  nbytes,  0); 
wait  DMA  ( 0 ) ; 


c.  blk.v 

module  mult  32to32 (a,  b,  elk,  prod)  /*  synthesis  syn  black  box  */  ; 

input  [31:0]  a; 
input  [31:0]  b; 
output  [31:0]  prod; 
input  elk; 
endmodule 

module  add_32 (a,  b,  c,  sum)  /*  synthesis  adderparthere  */  ; 
input  [31:0]  a; 
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input 

[31:0]  b; 

input 

[31:0]  c; 

output 

[31:0]  sum; 

endmodule 

d.  info 

BEGIN  DEF 

"my  mult" 

MACRO 

=  "mult 

32to32 " ; 

STATEFUL  =  NO; 
EXTERNAL  =  NO; 
PIPELINED  =  YES; 
LATENCY  =  7 ; 

INPUTS 

=  2: 

10  = 

INT  32  BITS 

(a)  // 

explicit 

input 

11  = 

INT  32  BITS 

(b)  // 

explicit 

input 

r 

OUTPUTS 

=  1: 

OO  = 

r 

INT  32  BITS 

(prod) 

//  explicit  output 

IN  SIGNAL  :  1  BITS 

"elk"  = 

"CLOCK"; 

DEBUG 

HEADER  =  # 

void  my  mult 

dbg  (int 

a,  int  b. 

int  *prod) ; 

#; 

DEBUG 

FUNC  =  # 

void  my  mult 

dbg  (int 

a,  int  b. 

int  *prod) { 

*prod  =  a* 

b; 

} 

#; 

*prod  >>= 

32; 

END  DEF 

BEGIN  DEF 

"my  add" 

MACRO 

=  "add  32"; 

STATEFUL  =  NO; 
EXTERNAL  =  NO; 
PIPELINED  =  NO; 
LATENCY  =  1 ; 

INPUTS 

=  3: 

10  = 

INT  32  BITS 

(a)  // 

explicit 

input 

11  = 

INT  32  BITS 

(b)  // 

explicit 

input 

12  = 

INT  32  BITS 

(c)  // 

explicit 

input 

r 

OUTPUTS 

=  1: 

OO  = 

r 

INT  32  BITS 

( sum) 

//  explicit  output 

DEBUG 

HEADER  =  # 

void  my  add  dbg  (int 

a,  int  b. 

int  c,  int  * 

APPENDIX  D.  COPY  OF  PROFILE  REPORT 


The  profile  report  shows  the  execution  time  for  non-uniform  segmentation  with 
the  following  parameters:  ln(x)  ,  s  =  2  33  and  N  =  1,000,000.  Profile  reports  are  used 

to  debug  functions,  optimize  files  and  understand  the  dynamics  and  choke  points  in  the 
program.  Parent  functions  and  child  functions  can  be  analyzed  to  find  the  slow  points  in 
the  program. 

The  longest  times  in  the  report,  62.906s  and  50.703s  belong  to  xlabel  and  y label, 
respectively.  They  were  used  to  display  graphs  for  debugging  purposes.  Any  function 
used  to  drive  graphics  is  slow  compared  to  computation.  In  a  final  version,  the  display  is 
not  required  and  these  times  do  not  exist  and  therefore  have  no  impact. 

The  next  longest  functions  are  29.063  seconds  and  26.359  seconds  which 
correspond  to  multipleQuadApprox  and  varQuadApproxHybThirdNew  respectively. 
However  notice  that  these  are  total  times.  multipleQuadApprox  is  a  parent  function  to 
varQuadApproxHybThirdNew.  Notice  too  that  the  column  Self  Time  indicates  the  amount 
of  time  that  the  function  actually  spends  in  itself,  i.e.  the  remaining  time  is  spent  in  the 
child  functions.  The  child  function  to  varQuadApproxHybThirdNew  is  chebyRemez. 
This  makes  chebyRemez  the  longest  part  of  the  code.  The  child  functions  in  chebyRemez 
take  up  a  lot  of  time,  but  chebyRemz  is  the  most  suitable  metric  for  comparing  the  speed 
of  the  different  functions. 


Profile  Summary 

Generated  28-Jul-2007  08:59:56 


Function  name 

Calls 

Total 

Self  Time* 

Time 

Userlnnut 

1 

0.094  s 

0.094  s 

ancestor 

5252 

1.141  s 

function  is  recursive 

anccstor>isatypc 

10504 

0.469  s 

function  is  recursive 

axes  (Opaque-function) 

5252 

0.141  s 

0.141  s 
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axescheck 

18356 

0.859  s 

0.859  s 

cell. intersect 

3 

0.016  s 

0.000  s 

cell.setdiff 

3 

Os 

0.000  s 

cell. sort 

15 

0.016  s 

0.016  s 

cell.strmatch 

1 

0s 

0.000  s 

cell. unique 

9 

0.016  s 

0.000  s 

cellfun  (MEX-function) 

49 

0  s 

0.000  s 

cellstr 

6 

0s 

0.000  s 

chebvRemz 

4182 

20.469  s 

11.328  s 

colstvle 

1 

0s 

0.000  s 

deal 

1 

0s 

0.000  s 

double. superiorfloat 

34078 

0.141  s 

0.141  s 

fcnchk 

1 

0s 

0.000  s 

findall 

2 

0s 

0.000  s 

fliplr 

35592 

0.391  s 

0.391  s 

gca 

13113 

0.922  s 

0.578  s 

gcf 

13113 

0.391  s 

0.391  s 

setF 

1 

0.094  s 

0.078  s 

getappdata 

6 

0s 

0.000  s 

graph2d.series.schema>LdoDirtv  Action 

3 

0.016  s 

function  is  recursive 

...h2d.series.schema>LdoModeSwitchAction 

2 

0s 

function  is  recursive 

.... series. schema>LdoSetManualModeAction 

1 

Os 

function  is  recursive 

graph2d.series.schema>LdoYDataAction 

1 

Os 

function  is  recursive 

graDh2d.series.schema>LsetXDataSilentlv 

1 

Os 

0.000  s 

graphics\orivate\clo 

2 

0.031  s 

0.000  s 

graDhics\Drivate\clo>find  kids 

2 

Os 

0.000  s 
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handle .  listener  (Opaq  ue-function) 

6 

Os 

0.000  s 

hasbehavior 

2 

Os 

0.000  s 

hold 

2623 

1.422  s 

0.438  s 

inline,  feval 
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0.031  s 

0.016  s 

inline-inline 

3 

0.094  s 

0.000  s 

inline .  inline>strtrim 

3 

Os 

0.000  s 

inline. subsref 

46123 

12.172  s 

1.563  s 

inlineeval 

46287 

10.625  s 

10.625  s 

int2str 

2622 

0.219  s 

0.219  s 

intersect 

3 

0.016  s 

0.016  s 

isappdata 

7871 

0.563  s 

0.344  s 

iscell 

9 

0s 

0.000  s 

iscellstr 

41 

0s 

0.000  s 

isfield 

7877 

0.219  s 

0.219  s 

ishghandle 

5252 

0.250  s 

0.250  s 

ishold 

1 

0s 

0.000  s 

iskeyword 

2636 

0.078  s 

0.078  s 

ismembc  (MEX-function) 

1 

0  s 

0.000  s 

ismembc2  (MEX-function) 

1 

0s 

0.000  s 

ismember 

8 

0s 

0.000  s 

isobiect 

1 

0s 

0.000  s 

ispc 

6 

0s 

0.000  s 

isstruct 

1 

0s 

0.000  s 

isvarname 

2638 

0.188  s 

0.109  s 

legend 

1 

0.016  s 

function  is  recursive 

legend>find  legend 

1 

0s 

function  is  recursive 
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legend>islegend 

1 

Os 

0.000  s 

legendinfo 

1 

0.016  s 

function  is  recursive 

legendinfo>check  xvdata 

3 

Os 

0.000  s 

legendinfo>parsestruct 

2 

Os 

function  is  recursive 

line  (Opaque-function) 

3 

Os 

0.000  s 

linspace 

4183 

0.391  s 

0.391  s 

log  10 

2622 

0.016  s 

0.016  s 

manle 

36 

0s 

0.000  s 

manlemex  (MEX-function) 

36 

0s 

0.000  s 

meshgrid 

4 

0s 

0.000  s 

multi  pleOuadApprox 

1 

29.063  s 

0.188  s 

newplot 

2624 

1.453  s 

0.500  s 

newnlot>ObserveAxesNextPlot 

2624 

0.641  s 

0.047  s 

newnlot>ObserveFigureNextPlot 

2624 

0.094  s 

0.094  s 

num2str 

5244 

0.828  s 

0.594  s 

opaque. double 

11 

0s 

0.000  s 

parseparams 

1 

0s 

0.000  s 

plotdoneevent 

1 

0s 

0.000  s 

polvval 

34078 

1.922  s 

1.781  s 

quadl 

1 

0.078  s 

0.000  s 

quadl>quadlstep 
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0.078  s 

function  is  recursive 

scribe. legendinfo  (Opaque-function) 

4 

0s 

function  is  recursive 

scribe .  legendinfo .  le  gendinfo 

1 

0s 

function  is  recursive 

scribe. legendinfochild  (Opaque-function) 

12 

0s 

function  is  recursive 

scribe. legendinfochild.legendinfochild 

3 

0s 

function  is  recursive 

setdiff 

7 

0.078  s 

0.063  s 
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sortcellchar  (MEX-function) 

15 

Os 

0.000  s 

spccgraph. baseline  (Opaque-function) 

7 

0.047  s 

function  is  recursive 

specgraph.baseline.baseline 

1 

0.047  s 

function  is  recursive 

spec  graph,  stemseries  ( Opaque-function) 

25 

0.078  s 

function  is  recursive 

spec  graph.stemseries. refresh 

2 

0.016  s 

function  is  recursive 

.... stemseries. schema>LdoEdgeColorAction 

1 

0s 

0.000  s 

....  stemseries.  schema>LdoFaceColorAction 

1 

0s 

0.000  s 

...ies.schema>LdoSetManualCodeModeActio 

n 

3 

0s 

function  is  recursive 

. . .  aph.  stemseries  .schema>LdoUpdateAction 

2 

0s 

0.000  s 

. . .  series  .schema>LdoEIpdateBaselineAction 

1 

0s 

0.000  s 

...ies.schema>LdoUpdateChildMarkerAction 

2 

0s 

0.000  s 

. . .  series  .schema>LdoUpdateChildrenAction 

3 

0.016  s 

function  is  recursive 

...  tern  scries.  schema>LdoUpdatcXDataAction 

2 

0s 

0.000  s 

specgraph.stemseries.setLegendlnfo 

1 

0.016  s 

function  is  recursive 

spec  graph,  stemseries.  stemseries 

1 

0.063  s 

function  is  recursive 

specgraph\private\checkpvpairs 

1 

0s 

0.000  s 

spec  graph\private\nextstvle 

1 

0.016  s 

0.016  s 

stem 

1 

0.109  s 

0.016  s 

stem>parseargs 

1 

0s 

0.000  s 

str2num 

10 

0.016  s 

0.016  s 

str2num>protected  conversion 

10 

0s 

0.000  s 

strmatch 

1 

0s 

0.000  s 

svm.abs 

1 

Os 

0.000  s 

sym.char 

16 

0.016  s 

0.016  s 

svm.diff 

3 

0.016  s 

0.000  s 

svm.eq 

1 

Os 

0.000  s 
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sym.findsym 

3 

0.016  s 

0.000  s 

sym.  findsym>pickvar 

3 

0.016  s 

0.016  s 

sym.log 

2 

0.047  s 

0.000  s 

svm.manle 

10 

0.047  s 

0.000  s 

sym.sqrt 

2 

0.016  s 

0.016  s 

svm.svm 

1329 

0.266  s 

0.063  s 

svm.svm>char2svm 

1324 

0.203  s 

0.094  s 

svm.svm>trim 

1324 

0.031  s 

0.000  s 

sym.uminus 

2 

0s 

0.000  s 

syms 

1314 

0.578  s 

0.219  s 

svmvar 

3 

0.094  s 

0.000  s 

svmvar>findrun 

12 

0.016  s 

0.016  s 

symvar>isquoted 

3 

0s 

0.000  s 

title 

5244 

1.234  s 

function  is  recursive 

unique 

4 

0.016  s 

0.016  s 

usev6Dlotapi 

1 

0s 

0.000  s 

varOuadApprox  H  ybTh  i  rdNew 

1311 

26.359  s 

3.297  s 

vectorize 

3 

0.031  s 

0.031  s 

xlabel 

5244 

62.906  s 

function  is  recursive 

xychk 

1 

0s 

0.000  s 

ylabel 

5244 

50.703  s 

function  is  recursive 

Self  time  is  the  time  spent  in  a  function  excluding  the  time  spent  in  its  child  functions. 
Self  time  also  includes  overhead  resulting  from  the  process  of  profiling. 
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APPENDIX  E.  LESSONS  LEARNED 


This  section  provides  information  and  a  record  of  problems  that  were  encountered 
while  using  the  SRC-6,  and  other  software  applications  in  this  thesis.  The  intent  is  to 
provide  a  reference  to  specific  issues  previously  encountered  and  to  reduce  the  amount  of 
time  to  resolve  or  understand  them  in  the  future. 


E.l  FILE  NAMING  PROBLEMS 

Problem:  When  you  compile  your  VHDL  code  using  Xilinx’s  ISE  Navigator,  it  accepts 
upper  and  lower  case  versions  of  letters  as  the  same.  That  is,  adderVerilog .  vhd  and 
adderverilog .  vhd  are  the  same  file  to  Xilinx’s  ISE  Navigator.  However,  files  in  the 
SRC  are  case  sensitive.  That  is,  adderverilog .  vhd  and  adderverilog .  vhd  are 
DIFFERENT  files  in  the  SRC-6.  So,  if  you  have  listed  adderverilog .  vhd  in  your 
Makefile  as  a  macro,  it  will  not  recognize  adderverilog .  vhd  as  the  target  file. 
Additionally,  if  you  let  Xilinx  create  VHDL  code  from  a  schematic  which  contains  the 
module  adderverilog  .vhd  it  will  list  refer  to  the  module  in  the  VHDL  code  as 
adderverilog . vhd . 

Solution:  Use  lower  case  letters  for  ALL  files. 

Author:  J.T.  Butler 

Date:  26  FEB  07 

E.2  USING  THE  CONST  CONSTRUCT  IN  C 
Problem:  A  martello64  error  is  obtained  when  using 

int64_t  array[5][5]  =  {  {1,2, 3, 4, 5}; 

{6,7,8,9,10}; 

{11,12,13,14,15}; 

{16,17,18,19,20}; 

{21,22,23,24,25}  }; 

The  error  is  caused  by  “too  many  accesses  to  BRAM”. 

Background:  This  is  a  correct  C  construct  when  used  on  a  PC  or  workstation. 

However,  when  it  is  in  a  .  me  file,  this  declaration  will  cause  a  martello64  error.  It  is 
possibly  due  to  too  many  accesses  to  a  BRAM  (arrays  are  usually  stored  in  BRAM). 

This  was  a  problem  that  Scott  Bailey  experienced.  The  initial  writeup  is  based  on  a 
conversation  between  Scott  Bailey  and  Jon  Butler  on  December  1,  2006 
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Solution:  In  discussing  this  with  Dave  Caliga,  Scott  learned  that  the  Carte™  2.2  version 
should  correct  this  error.  At  the  time  the  error  occurred,  we  were  using  Carte™  2.1. 
Apparently,  Carte™  2.2  spaces  out  the  accesses  to  BRAM  so  that  it  can  be  changed  to 
include  ALL  25  data  values.  However,  in  order  to  use  it  in  Carte™  2.2,  you  need  to 
declare  the  array  as  a  constant,  like  so 

const  int64_t  array[5] [5]  =  {  {1,2, 3, 4, 5}; 

{6,7,8,9,10}; 

{11,12,13, 14,15}; 

{16,17,18,19,20}; 

{21,22,23,24,25}  } 

The  intent  of  const  is  to  set  up  a  constant  array  that  is  not  changed  in  the  rest  of  the 
program,  much  like  a  ROM  instead  of  RAM. 

Scott  Bailey  tried  to  work  around  this  error  by  simply  defining  the  array  without 
populating  it  with  initial  values,  using,  for  example:  int64_t  array  [5]  [5];  The 
compiler  accepted  this.  He  then  put  the  desired  values  into  array  using  for  loops. 
These  arrays  will  then  work  as  nonnal  C  arrays  within  the  .me  code.  However,  this 
decreases  performance,  since  the  values  placed  into  the  array  must  come  from  either 
OBM  or  streams,  access  of  which  will  incur  a  time  penalty.  Scott  believes  that  the 
problem  is  in  putting  too  many  values  into  BRAM  too  quickly.  In  a  dialog  with  Dave 
Caliga  (SRC  Computers),  Dave  said  that  the  problem  occurs  when  there  are  more  than  8 
initialized  values  placed  in  the  array.  Scott  believes  that  this  problem  will  occur  in 
BOTH  Carte™  2. 1  and  2.2  for  non-constant  BRAM  arrays. 

Author:  J.T.  Butler 

Date:  26  FEB  07 

E.3  INCORRECT  ARGUMENTS  IN  SYSTEM  SUPPLIED  MACROS 

Problem:  A  core  dump  occurs  when  the  call-by-value  and  call-by-reference  conventions 
are  not  adhered  to 


popcount_64 (int64_t  a,  int  arrayfi]) 

Instead  of  an  error  message,  there  will  be  a  core  dump. 

Background:  This  was  provided  by  Scott  Bailey  in  a  conversation  with  Jon  Butler  on 
December  1,  2006. 

Solution:  To  solve  this  problem,  use  the  following  code. 

popcount_64 (int64_t  a,  &temp) 
array [i]  =  temp; 
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For  most  system  macros,  SRC  requires  that  the  input  values  be  passed  as  call-by-value 
(e.g.  a)  and  all  output  values  be  done  as  call-by-reference  (e.g.  &temp) . 

Author:  J.T.  Butler 

Date:  26  FEB  07 

E.4  IF  /  THEN  /  ELSE  LIMITATION 

Problem:  When  programming  in  C  within  the  .me  file  (no  macro)  an  error  occurs  when 
the  “If,  then,  else”  chain  is  too  long  (approx  26  long). 

Background:  This  was  discovered  by  Prof.  Jon  Butler  when  trying  to  implement  a  long 
“if, then, else”  string  during  testing. 

Solution:  SRC  Carte™  V2.2  fixes  this  problem. 

Author:  T.J.  Mack 

Date:  26  FEB  07 

E.5  MULTIPLE  FILES  USED  IN  A  MACRO 

Problem:  When  using  multiple  files  to  describe  a  circuit  in  a  macro,  the  SRC  won’t 
successfully  compile. 

Background:  This  was  discovered  while  developing  the  NFG  macro  where  different 
modules  are  described  in  separate  VHDL  files. 

Solution:  List  all  of  the  VHDL  files  within  the  Makefile  under  macros,  separated  by  a 
space. 

Author:  T.J.  Mack 

Date:  26  FEB  07 

E.6  XILINX  /  SYNPLIFY  INCONSISTENCIES 

Problem:  VHDL  code  synthesizes  correctly  (no  errors)  in  Xilinx  XST,  but  does  not  in 
Synplify  PRO. 

Background:  When  developing  VHDL  code  for  the  NFG,  the  code  was  originally 
written  in  the  Xilinx  ISE.  Checking  for  errors  using  Xilinx  XST  resulted  in  no  errors. 
When  the  code  was  transported  to  the  SRC,  errors  resulted.  Further  troubleshooting 
produced  the  same  errors  when  using  the  stand-alone  Synplify. 

Solution:  Not  all  code  is  universal.  Always  test  code  using  a  stand-alone  version  of 
Synplify.  If  it  results  in  errors,  the  code  must  be  modified. 

Author:  T.J.  Mack 

Date:  26  FEB  07 
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E.7  MODELSIM  AND  MULTIPLE  HDL’S 

Problem:  ModelSim  XE  (Xilinx  Edition)  which  is  obtained  for  free  from  the  Xilinx 
website  does  not  support  multiple  HDL’s. 

Background:  When  developing  the  NFG,  some  code  was  provided  by  SRC  in  Verilog. 
When  attempting  to  analyze  the  circuit  with  a  test  bench,  an  error  occurred  in  ModelSim. 
The  error  stated  that  ModelSim  XE  does  not  support  multiple  HDL’s. 

Solution:  Download  ModelSim  SE.  NPS  has  a  license.  Details  available  from  Dan 
Zulaica. 

Author:  T.J.  Mack 

Date:  26  FEB  07 

E.8  INITIALIZING  MEMORY  FROM  A  SEPARATE  FILE 

Problem:  Xilinx  allows  one  to  synthesize  a  ROM  where  the  ROM  contents  are  specified 
in  a  separate  file.  When  transferring  the  VHDL  files  to  the  SRC  and  synthesizing  with 
Synplify,  an  error  results.  This  is  another  artifact  of  problem  F.  above. 

Background:  Because  of  the  potentially  large  amount  of  data  needed  to  load  into  a 
ROM,  it  is  useful  to  have  a  separate  file  with  just  this  data.  The  HDL  must  then  access 
this  data  file  during  synthesis. 

Solution:  Problem  not  completely  solved,  yet.  Some  potential  solutions  are: 

1 .  Below  is  a  ROM  provided  by  SRC  Computers.  Written  in  Verilog,  (SRC 
Computer’s  preferred  language)  it  is  comprised  of  32,  4-input,  1-bit  output  LUTs.  It  has 
a  32-bit  output.  It  is  initialized  using  a  separate  .sdc  file. 

module  MY_ROM  ( 
data, 
adr 
)  ; 


output 

[31: 

0]  data; 

input 

[3:0] 

adr; 

ROM16X1 

MO  ( 

.0 

(data 

[0]  )  , 

.AO 

(adr  [ 

0]  )  , 

.  A1 

(adr  [ 

1]  )  , 

.  A2 

(adr  [ 

2]  )  , 

.A3 

)  ; 

(adr  [ 

3]  ) 

ROM16X1 

Ml  ( 

.0 

(data 

[1]  )  , 

.AO 

(adr  [ 

0]  )  , 

.  A1 

(adr  [ 

1]  )  , 

.  A2 

(adr  [ 

2]  )  , 
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.A3 


(adr [3] ) 


*** 

Fill-In  Remaining  Modules 

ROM16X1 

M31  ( 

.0 

(data [31] ) , 

.AO 

(adr [0] ) , 

.  A1 

(adr [1] ) , 

.  A2 

(adr [2 ] ) , 

.A3 

) ; 

(adr [3] ) 

*** 


endmodule 


The  ROM  initialization  values  are  in  the  .sdc  file  below.  The  INITs  are 
somewhat  cumbersome,  since  the  LUTs  are  1-bit  wide.  So  each  of  the  LUTs  has  one  bit 
position  for  all  of  the  16  values.  The  INIT  values  essentially  represent  a  32  row  by  16 
column  matrix.  Each  column  represents  one  of  16,  32-bit  outputs. 

define  attribute  {i:M0}  xc_props  "INIT=ba5d" 
define  attribute  {i:Ml}  xc  props  "INIT=8801" 

***  Fill-In  Missing  Values  *** 

define  attribute  {i:M31}  xc  props  "INIT=1321" 

This  is  the  most  promising  example  of  a  ROM  with  an  external  file  for 
initialization.  However,  the  1-bit  format  of  the  init  values  makes  it  difficult  to 
implement. 

2.  Below  is  another  ROM  example  provided  by  SRC  Computers,  ft  uses  the 
RAMB1 6_S1 8_S1 8  module  which  is  a  16  Kb  Block  RAM  with  two  18-bit  outputs  (16- 
bits  plus  2-bits  for  parity).  It  is  initialized  using  the  xc_props  lines  within  the  same 
fde. 

module  MY_ROM  ( 
din_0, 
dout_0, 
din_l , 
dout_l , 
adr_0, 
adr_l , 
w  _en  0 , 
w_en_l , 
elk 
)  ; 

input  [15:0]  din_0; 
output  [15:0]  dout_0; 
input  [15:0]  din_l; 
output  [15:0]  dout_l; 
input  [9:0]  adr_0; 
input  [9:0]  adr_l; 
input  w_en_0; 
input  w  en  1; 

input  elk  /*  synthesis  syn_noclockbuf =1  */  ; 
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RAMB 1 6_S  1 8_S  1 8  MO  ( 

.DOA  (dout_0 [15:0] ) , 

.DOB  (dout_l [15:0] ) , 

.DOPA  (),  //  ignore  the  parity  outputs 

.DOPE  (),  //  ignore  the  parity  outputs 

.ADDRA  (adr_0), 

. ADDRB  (adr_l ) , 

. CLKA  (elk), 

. CLKB  (elk), 

. DIA  (din_0[15:0] ) , 

.DIB  (din_l [15:0] ) , 

. DIPA  (2'bO),  //  zero  the  parity  inputs 

. DIPB  (2'bO),  //  zero  the  parity  inputs 

.ENA  (l'bl), 

. ENB  (l'bl), 

. SSRA  (1'bO), 

. SSRB  (1'bO), 

. WEA  (w_en_0), 

.WEB  (w_en_l) 

)  /*  synthesis 

xc_props=" INIT_00=7  6931f ac9dab2b36c24  8b87d6ae33f 9a62d7 183a5d5789e4b2d6b441e2411dc7,\ 
INIT_01=09elllc7ele7acb6f 8cac0bb2fc4c8bc2ae3baaab9165cc458el99cb89f51bl3, \ 
INIT_02=5f7091a5abb0874df3e8cb4543a5eb93b0441e9ca4c2b0fb3d30875cbf29abd5, \ 
INIT_3e=la0bf 9b00ffd21b6210blldc59ec947be86dllel0de2e980b8bc988e26aba269,  \ 

***  Fill-In  Missing  Values  *** 

INIT_3f=ac6bd4cd2bf047 If fcb95377 92244 9de5393850a00a57b47800d374d961dfeb5"  */  ; 

endmodule 

3.  The  following  code  is  a  16  x  32-bit  ROM  written  in  Verilog.  It  will 
synthesize  in  Xilinx  XST,  but  not  in  Synplify  PRO. 

module  romverlog ( input  [3:0]  raddr,  output  [31:0]  slope_int) ; 

reg  [15:0]  mem  [31:0]; 

initial 

begin 

$readmemb ("memory. mem",  mem) ; 

end 

assign  slope  int  =  mem [raddr]; 

endmodule 

The  associated  memory. mem  file  is  a  simple,  binary  text  file  with  the  memory 
initialization  values. 

00000110010001000000000000000000 

00000110001011010000000000000000 

00000101111111110000000000000100 

00000101101110100000000000001100 

00000101011000000000000000011010 

00000100111100010000000000101111 

00000100011100000000000001001101 

00000011110111110000000001110100 
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00000011001111110000000010100101 

00000010100100110000000011100001 

00000001110111100000000100100111 

00000001001000010000000101110111 

00000000011000000000000111001111 

00000001110111100000000100100111 

00000001001000010000000101110111 

00000000011000000000000111001111 

Author:  T.J.  Mack 

Date:  26  FEB  07 

E.9  MACRO  LATENCY  AND  SRC  OVERHEAD 

Problem:  When  implementing  a  macro,  SRC  requires  additional  clocks  to  accomplish 
overhead  operations.  The  overhead  appears  to  be  5  clock  cycles  to  pass  data  to  a  macro 
and  an  additional  5  clock  cycles  to  receive  data  from  a  macro.  One  would  expect  a  macro 
with  a  latency  of  3  to  take  a  total  of  13  clock  cycles.  However,  it  takes  only  12.  The  last 
clock  cycle  is  absorbed  into  the  5  clock  cycles  needed  to  receive  data  from  the  macro.  In 
this  case,  the  latency  in  the  info  file  must  be  set  equal  to  2,  even  though  the  schematic 
may  show  a  latency  of  3. 

Background:  When  developing  the  NFG,  pipeline  depth  reports  for  the  loop  that  calls 
the  NFG  macro  were  always  10  clock  cycles  more. 

Solution:  No  solution.  This  is  a  characteristic  of  the  SRC  architecture. 

Author:  T.J.  Mack 

Date:  26  FEB  07 

E.10  CANNOT  USE  PRIORITY  SELECTOR  GREATER  THAN  128 

Problem:  When  implementing  a  priority  selector  with  256  elements,  64  bits  wide,  I 
could  not  compile  the  .me  file.  This  is  because  the  architecture  already  had  3  64  bit  wide 
multipliers  and  other  hardware  that  consumed  some  of  the  resources.  However,  if  you 
don’t  need  all  256  priority  selectors,  it  would  be  nice  to  have  a  selector  that  is  greater 
than  128,  and  smaller  than  256. 

Background:  When  implementing  the  priority  selectors  with  150  elements,  the  only 
option  for  a  single  selector  is  to  use  the  256  selector,  but  that  is  106  more  elements  than 
required. 

Solution:  Use  multiple  selectors  of  smaller  sizes. 

Author:  N.  Macaria 

Date:  26JUL07 
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E.ll  IF-THEN-ELSE  STATEMENT  WITH  SRC  PRIORITY  SELECTORS 

Problem:  When  implementing  multiple  priority  selectors  in  the  .me  file,  SRC  would  not 
accept  an  if-then-else  statement  to  contain  priority  selectors  in  the  body. 

Background:  When  running  the  program,  it  would  not  compile  if  a  priority  selector  was 
used  inside  an  if-then-else  statement.. 

Solution:  Put  the  if-then-else  statement  prior  to  the  priority  selector,  use  a  variable  to 
store  the  selector  you  want  to  pick,  then  use  a  case  statement  to  reach  that  selector. 

Author:  N.  Macaria 

Date:  26JUL07 

E.12  FIND  THE  SLOW  CODE  IN  MATLAB  PROGRAMS 

Problem:  When  running  MATLAB  programs,  sometimes  the  code  takes  very  long  to 
execute  and  you  may  not  be  sure  where  the  problem  exists. 

Background:  When  running  the  chebyRemz,  program,  there  were  portions  of  code  that 
would  take  very  long  to  run. 

Solution:  Put  the  if-then-else  statement  prior  to  the  priority  selector,  use  a  variable  to 
store  the  selector  you  want  to  pick,  then  use  a  case  statement  to  reach  that  selector. 

Author:  N.  Macaria 

Date:  26JUL07 
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APPENDIX  F.  SEGMENT  ESTIMATION  EQUATION 


The  segment  estimation  equation  is  derived  from  analyzing  the  Chebyshev 
approximation  error  equation  (0.6)  is  the  general  case: 


s  = 


2(b-a)d+l  |  ,+1 
4“'+V+i)rmax 


(0.6) 


The  variable  d  is  the  order  of  the  approximation  to  be  used.  For  the  case  of  quadratic 
approximation,  d=2  and  (b-a)  is  the  estimated  width  of  the  segment. 


s  = 


2(b-a )3 
43  (3) ! 


(43x3x2)s 
2x  f"1  (x) 

J  max  \  J 


(b-a)3 


(b  -a)3 


(4Jx3x^)g  _  3£ 

Xx|/i(x)|  1  ’  | fl(x) 


EstLenSeg  =  (b  -  a)  =  4x 


(0.3) 
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